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Abstract 



Building on the inequalities for homogeneous tetrahedral polynomials in independent 
Gaussian variables due to R. Latala we provide a concentration inequality for non-necessarily 
O ' Lipschitz functions / : R" — >■ K with bounded derivatives of higher orders, which hold when 

.^ I the underlying measure satisfies a family of Sobolev type inequalities 

^: ||g-E.g||p<C(p)||Vg|lp. 



Such Sobolev type inequalities hold, e.g., if the underlying measure satisfies the log-Sobolev 
inequality (in which case C{p) < C^) or the Poincare inequality (then C{p) < Cp). Our 
concentration estimates are expressed in terms of tensor-product norms of the derivatives of 

^ '. f 

C^ ' When the underlying measure is Gaussian and / is a polynomial (non-necessarily tetra- 

hedral or homogeneous), our estimates can be reversed (up to a constant depending only 
on the degree of the polynomial). We also show that for polynomial functions, analogous 
estimates hold for arbitrary random vectors with independent sub-Gaussian coordinates. 
►^ ' We apply our inequalities to general additive functionals of random vectors (in particular 

\^ . linear eigenvalue statistics of random matrices) and the problem of counting cycles of fixed 

O^ ' length in Erdos-Renyi random graphs, obtaining new estimates, optimal in a certain range 

OO . of parameters. 
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1 Introduction 

K> ' Concentration of measure inequalities are one of the basic tools in modern probability theory 

j^ I (see the monograph [45]). The prototypic result for all concentration theorems is arguably the 

Gaussian concentration inequality [14, 60], which asserts that if G is a standard Gaussian vector 
in R" and /: R" — t- R is a 1-Lipschitz function, then for all t > 0, 

P(|/(G) - E/(G)| > t) < 2exp(-tV2). 

Over the years the above inequality has found numerous applications in the analysis of Gaus- 
sian processes, as well as in asymptotic geometric analysis (e.g. in modern proofs of Dvoretzky 
type theorems). Its applicability in geometric situations comes from the fact that it is dimen- 
sion free and all norms in R" are Lipschitz with respect to one another. However, there are 
some probabilistic or combinatorial situations, when one is concerned with functions that are 
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not Lipschitz. The most basic case is the probabilistic analysis of polynomials in independent 
random variables, which arise naturally, e.g., in the study of multiple stochastic integrals, in 
discrete harmonic analysis as elements of the Fourier expansions on the discrete cube or in 
numerous problems of random graph theory, to mention just the famous subgraph counting 
problem [35, 34, 22, 26, 25]. 

The concentration of measure or more generally integrability properties for polynomials have 
attracted a lot of attention in the last forty years. In particular Bonami [13] and Nelson [53] 
provided hypercontractive estimates (Khintchine type inequalities) for polynomials on the dis- 
crete cube and in the Gauss space, which have been later extended to other random variables by 
Kwapieh and Szulga [40] (see also [41]). Khintchine type inequalities have been also obtained 
in the absence of independence for polynomials under log-concave measures by Bourgain [19], 
Bobkov [10], Nazarov-Sodin-Volberg [52] and Carbery- Wright [21]. 

Another line of research is to provide two sided estimates of moments of polynomials in 
terms of deterministic functions of the coefficients. Borell [15] and Arcones-Gine [5] provided 
such two sided bounds for homogeneous polynomials in Gaussian variables. They were ex- 
pressed in terms of expectations of suprema of certain empirical processes. Talagrand [62] and 
Bousquet-Boucheron-Lugosi-Massart [18, 17] obtained counterparts of these results for homo- 
geneous tetrahedral^ polynomials in Rademacher variables and Lochowski [47] and Adamczak 
[1] for random variables with log-concave tails. Inequalities of this type, while implying (up to 
constants) hypercontractive bounds, have a serious downside as the analysis of the empirical 
processes involved is in general difficult. It is therefore important to obtain two-sided bounds 
in terms of purely deterministic quantities. Such bounds for random quadratic forms in inde- 
pendent symmetric random variables with log-concave tails have been obtained by Latala [42] 
(the case of linear forms was solved earlier by Gluskin and Kwapieh in [28], whereas bounds 
for quadratic forms in Gaussian variables were obtained by Hanson- Wright [31], Borell [15] and 
Arcones-Gine [5]). Their counterparts for multilinear forms of arbitrary degree in nonnegative 
random variables with log-concave tails have been derived by Latala and Lochowski [44]. As 
for the symmetric case, the general problem is still open. An important breakthrough has been 
obtained by Latala [43], who proved two-sided estimates for Gaussian chaoses of arbitrary order, 
that is for homogeneous tetrahedral polynomials of arbitrary degree in independent Gaussian 
variables (we recall his bounds below as they are the starting point for our investigations). For 
general symmetric random variables with log-concave tails similar bounds are known only for 
chaoses of order at most three [2] . 

Polynomials in independent random variables have been also investigated in relation with 
combinatorial problems, e.g. with subgraph counting [35, 34, 22, 26, 25]. The best known result 
for general polynomial in this area has been obtained by Kim and Vu [36, 63], who presented 
a family of powerful inequalities for [0, l]-valued random variables. Over the last decade they 
have been applied successfully to handle many problems in probabilistic combinatorics. Some 
recent inequalities for polynomials in the so called subexponential random variables have been 
also obtained by Schudy and Sviridenko [58, 57]. They are a generalization of the special case 
of exponential random variables in [44] and are expressed in terms of quantities similar to those 
considered by Kim-Vu. 

Since it is beyond the scope of this paper to give a precise account of all the concentration 
inequalities for polynomials, we refer the reader to the aforementioned sources and recommend 
also the monographs [41, 23], where some parts of the theory are presented in a uniform way. 
As already mentioned we will present in detail only the results from [43], which are our main 
tool as well as motivation. 

As for concentration results for general non-Lipschitz functions, the only reference we are 
aware of, which addresses this question is [29], where the Authors obtain interesting inequalities 
for stationary measures of certain Markov processes and functions satisfying a Lyapunov type 
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condition. Their bounds are not comparable to the ones which we present in this paper. On 
the one hand they work in a more general Markov process setting, on the other hand, when 
specialized, e.g., to quadratic forms of Gaussian vectors, they do not recover optimal inequalities 
given in [15, 5, 43] (see Section 4 in [29]). Since the language of [29] is very different from ours, we 
will not describe the inequalities obtained therein and refer the interested reader to the original 
paper. 

Let us now proceed to the presentation of our results. To do this we will first formulate a two 
sided tail and moment inequality for homogeneous tetrahedral polynomials in i.i.d. standard 
Gaussian variables due to Latala [43]. To present it in a concise way we need to introduce 
some notation which we will use throughout the article. For a positive integer n we will denote 
[n] = {1, . . . , n}. The cardinality of a set I will be denoted by #1. For i = (ii, . . . , i^j) G [nY 
and / C [d] we write i/ = {ik)k<^i. We will also denote |i| = max^x^ij. 

Consider thus a d-indexed matrix A = (aj^^...^j^)" ^ ^^, such that ai-^^,,,^i^ = whenever 
ij = ik for some j ^ k, a sequence gi, ■ ■ ■ ,gn of i.i.d. M{0, 1) random variables and define 



^ = Y^ Oifti ■■■9id- 



16 n' 



Without loss of generality we can assume that the matrix A is symmetric, i.e., for all permuta- 
tions a: [n]-^ [n], ai^^,„^i^ = a^(ii),...,^(i^). 

Let now P^i be the set of partitions of {1, . . . , d} into nonempty, pairwise disjoint sets. For 
a partition J' = {Ji, . . . , Jj.}, and a d-indexed matrix A = (ai)igr„id (non-necessarily symmetric 
or with zeros on the diagonal), define 



k 



\A\\j = sup{ Y, ^iU^il-- \\{4l)h <l,l<l<k}, (2) 

iGM^ 1 = 1 



where ||(xij^)||2 = Jj2\ij\<nXij ■ Thus, e.g. 



\\iaij)i,j<n\\{i,2} = sup{ Y aijXij-. Y ^% - ^} = J 5Z 4 = ll(aii)i,i<«llHS, 
||(aji)i,j<n||{i}{2} = sup{ Y o-ij^iVj- Y^^ - ^'Yy'j - ^} = \\iaij)i,j<n\\q^ei^, 

i,j<n i<n j'^n 

ll(«iifc)i,i,fc<n||{l,2}{3} =SUp{ Y (^ijXijVk- Y ^l < 1, X] ^^ - ^^ ' 

i,j,k<n i^jl^n k<n 

From the functional analytic perspective the above norms are injective tensor product norms 
of A seen as a multilinear form on (M") with the standard Euclidean structure. 

We are now ready to present the inequalities by Latala. Below, as in the whole article 
by Cd we denote a constant, which depends only on d. The values of Cd may differ between 
occurrences. 

Theorem 1.1. For any d-indexed symmetric matrix A = (ai)ig[„]d such that Ci = if ij = i^ 
for some j ^ k, the random, variable Z, defined by (1) satisfies for all p > 2, 

C-i Y: V*'I^\\A\\j < \\Z\\, <CdY V*'l^\\A\\j. 

As a consequence, for all t > 1, 

CV^exp -Qmin^^ < P Z > t < Qexp - — min ^^ 



It is worthwhile noting that for ^J > 1, the norms H^Hj- are not unconditional in the 
standard basis (decreasing coefficients of the matrix may not result in decreasing the norm). 
Moreover, for specific matrices they may not be easy to compute. On the other hand, for any 



d-indexed matrix A and any J' G Pd, we have \\A\\j < ||^||{i,...,rf} = yX^iO?. Using this fact in 
the upper estimates above allows to recover (up to constants depending on d) hypercontractive 
estimates for homogeneous tetrahedral polynomials due to Nelson. 

Our main result is an extension of the upper bound given in the above theorem to more 
general random functions and measures. Below we present the most basic setting we will work 
with and state the corresponding theorems. Some additional extensions are deferred to the main 
body of the article. 

We will consider a random vector X in M", which satisfies the following family of Sobolev 
inequalities. For any p >2 and any smooth integrable function / : M" — )■ R, 

\\f{X)-Ef{X)\\,<L^ \Vf{X)\ , (3) 

p 

for some constant L (independent of p and /), where | • | is the standard Euclidean norm on 
R". It is known (see [3] and Theorem 3.4 below) that if X satisfies the logarithmic Sobolev 
inequality with constant Z?lS) then it satisfies (3) with L = ^yDLs/2. We remark that there 
are many criteria for a random vector to satisfy the logarithmic Sobolev inequality (see e.g. 
[45, 7, 11, 8, 38]), so in particular our assumption (3) can be verified for many random vectors 
of interest. 

Our first result is the following theorem, which provides moment estimates and concentration 
for Z)-times differentiable functions. The estimates are expressed by || • || j- norms of derivatives 
of the function (which we will identify with multi- indexed matrices). We will denote the d-th 
derivative of / by D /. 

Theorem 1.2. Assume that a random vector X in M" satisfies the inequality (3) with constant 
L. Let f:W -^R be a function of the class C^ . For all p > 2 if D^/(X) G LP, then 

||/(X)-E/(X)||,<Cz,(l^ j; p^ ||D^/(X)||^ + Yl i'E^^II™'/WII^ 



JePo ^ i<d<D-i jePd 



In particular ifY)f{x) is uniformly hounded on R"", then setting 

r?f (t) = min I min ( — =, ,,„ ^^ „/ m, — I , niin min ( — ,,,^„ , „, — rr^ — I I 

^ \JePo\LDsviV^^^„\\DDf{x)\\j) 'i<d<D-ij&pAL<i\\ED'^f{X)\\j) J 

we obtain for t > 0, 

n\f{X) - Ef{X)\ > i) < 2exp ( - ^ 

The above theorem is quite technical, so we will now provide a few comments, comparing it 
to known results. 

1. It is easy to see that if D = 1, Theorem 1.2 reduces (up to absolute constants) to the 
Gaussian-like concentration inequality, which can be obtained from (3) by Chebyshev's inequality 
(applied to general p and optimized). 

2. If / is a homogeneous tetrahedral polynomial of degree D, then the tail and moment esti- 
mates of Theorem 1.2 coincide with those from Latala's Theorem. Thus Theorem 1.2 provides 
an extension of the upper bound from Latala 's result to a larger class of measures and functions 
(however we would like to stress that our proof relies heavily on Latala's work) . 



3. If / is a general polynomial of degree D, then D /(x) is constant on M" (and thus equal 
to ED^/(X)). Therefore in this case the function rjf appearing in Theorem 1.2 can be written 
in a simplified form 

■ ■ f t \2/#^ 

rifit) = mm mm — ,ii^„ . „, — rr^ — • (4 

'•^^ ^ i<d<DjePAL'^\\KT)'^fiX)\\jJ ^' 

4. For polynomials in Gaussian variables, the estimates given in Theorem 1.2 can be reversed, 
like in Theorem 1.1. More precisely we have the following theorem, which provides an extension 
of Theorem 1.1 to general polynomials. 

Theorem 1.3. If G is a standard Gaussian vector in M" and f : M" -^ M. is a polynomial of 
degree D, then for all p > 2, 

l<d<DJePd l<d<DJePd 

Moreover for all t > 0, 

^exp ( - Gorifit)) < P(|/(G) - E/(G)| > t) < Cd exp ( - -^Vf{t)), 
where 



Vf{t) 



mm mm 



i<d<DjePd V||IED^/(G)|| 



J 



5. It is well known that concentration of measure for general Lipschitz functions fails e.g. on 
the discrete cube and one has to impose some additional convexity assumptions to get sub- 
Gaussian concentration [61]. It turns out that if we restrict to polynomials, estimates in the 
spirit of Theorems 1.1 and 1.2 still hold. To formulate our result in full generality recall the 
definition of the ^^2 Orlicz norm of a random variable Y, 

Y' 



|y||v,2 =inf|t >0: Eexp('^-) < 2} 



By integration by parts and Chebyshev's inequality ||l"||i/)2 < 00 is equivalent to a sub-Gaussian 
tail decay for Y. We have the following result for polynomials in sub-Gaussian random vectors 
with independent components. 

Theorem 1.4. Let X = (Xi, . . . ,X„) he a random vector with independent components, such 
that for all i < n, \\Xi\\^^ < L. Then for every polynomial f-.W^^Rof degree D and every 

D 

||/(X) -E/(X)||p <CdY.L''Y. P*^/'I|IED'/Wb- 

d=l JdVd 

As a consequence, for any t > 0, 

|/(X)-E/(X)|>t)<2exp(-^ 
where 



^/(*) 



mm mm 



iKdKDJaPa \L'^\\&'D'^f{X)\\j 



6. We postpone the applications of our theorems to subsequent sections of the article and here 
we announce only that apart from polynomials we apply Theorem 1.2 to additive functionals and 
^/-statistics of random vectors, in particular to linear eigenvalue statistics of random matrices, 
obtaining bounds which complement known estimates by Guionnet and Zeitouni [30]. Theorem 
1.4 is applied to the special case of the problem of subgraph counting in large random graphs. In 
a special case when one counts copies of a given small cycle, our result allows to obtain optimal 

fc~2 1 

inequalities for random graphs G{n,p), with p — t- slowly, namely p > n ^(fe-i) log" 2 n, where k 
is length of a cycle. To the best of our knowledge they are the best currently known inequalities 
for this range of p. 

7. Let us now briefly discuss optimality of our inequalities. The lower bound in Theorem 1.3 
clearly shows that Theorem 1.2 is optimal in the class of measures and functions it covers up to 
constants depending only on D. As for Theorem 1.4, it is similarly optimal in the class of random 
vectors with independent sub-Gaussian coordinates. In concrete combinatorial applications, for 
0-1 random variables this theorem may be however suboptimal. This can be seen already for 
D = 1, for a linear combination of independent Bernoulli variables Xi, . . . , X„ with ¥{Xi = 1) = 

1 — ¥{Xi = 0) = p. When p becomes small, the tail bound for such variables given e.g. by the 
Chernoff inequality is more subtle than what can be obtained from general inequalities for sums 
of sub-Gaussian random variables and the fact that ||Xj||^2 is of order (log(2/j)))~^/^. Roughly 
speaking, this is the reason why in our estimates for random graphs we have a restriction on the 
speed at which p — t- 0. At the same time our inequalities still give results comparable to what can 
be obtained from other general inequalities for polynomials. As already noted in the survey [35], 
bounds obtained from various general inequalities for the subgraph-counting problem, may not 
be directly comparable, i.e. those performing well in one case may exhibit worse performance in 
some other cases. Similarly, our inequalities cannot be in general compared e.g. to the estimates 
by Kim and Vu. For this reason and since it would require introducing new notation, we will 
not discuss these inequalities and just indicate, when presenting applications of Theorem 1.4, 
several situations when our inequalities perform in a better or worse way than those by Kim 
and Vu. Let us only mention that the Kim-Vu inequalities similarly as ours are expressed in 
terms of higher order derivatives of the polynomials. However, Kim and Vu (as well as Schudy 
and Sviridenko) look at maxima of absolute values of partial derivatives, which does not lead to 
tensor-product norms which we consider. While in the general sub-Gaussian case we consider, 
such tensor product norms cannot be avoided (in view of Theorem 1.3), it is not necessarily the 
case for 0-1 random variables. 

The organization of the paper is as follows. First, in Section 2, we introduce the notation 
used in the paper, next in Section 3 we give the proof of Theorem 1.2 together with some 
generalizations and examples of applications. In Section 4 we prove Theorem 1.3, whereas 
in Section 5 we present the proof of Theorem 1.4 and applications to the subgraph counting 
problems. In Section 6 we provide further refinements of estimates from Section 3 in the case 
of independent random variables satisfying modified log-Sobolev inequalities (they are deferred 
to the end of the article as they are more technical than those of Section 3) . In the Appendix 
we collect some additional facts used in the proofs. 

Acknowledgement We would like to thank Michel Ledoux and Sandrine Dallaporta for in- 
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2 Notation 

Sets and indices For a positive integer n we will denote [n] = {1, . . . ,n}. The cardinality of 
a set / will be denoted by #/. 



For i = {ii,...,id) G [n]'^ and / C [d] we write i/ = {ik)k£i- We will also denote |i| = 

For a finite set A and an integer d > we set 

A'^ = {i = {ii,...,id)£A'^: yj,k€{i,...,d} J^k^ij^ ik} 

(i.e. A- is the set of d-indices with pairwise distinct coordinates). Accordingly we will denote 
n- = n{n — 1) • • • {n — d+l). 

By Pd we will denote the family of partitions of [d\ into nonempty, pairwise disjoint sets. 

For a finite set / by i-2{I) we will denote the finite dimensional Euclidean space M^ endowed 



with the standard Euclidean norm |x|2 = A/^jg/ xj. Whenever there is no risk of confusion we 
will denote the standard Euclidean norm simply by | • |. 

Multi-indexed matrices For a function / : M" — )■ M by Y)'^f{x) we will denote the (d-indexed) 
matrix of its derivatives of order d, which we will identify with the corresponding symmetric 
d-linear form. If M = (Mi)igr„id, N = (A'^i)igr„i<j are d-indexed matrices, we define {M,N) = 
Eie[n]d MiNi. Thus for aU vectors yi, . . . , y^ G M" we have D"'/(x)(yi, ...,yd) = {D'^f{x),yi 
■■■0yd), where yi ■ ■ ■ yd = {yi^yi^ ■ ■ ■ yiji6[n]d- 

We will also define the Hadamard product of two such matrices M o iV as a d- indexed matrix 
with entries mi = M-^N-^ (pointwise multiplication of entries). 

Let us also define the notion of "generalized diagonals" of a d-indexed matrix A = (ai)igr„id. 
For a fixed set i^ C [d], with ^K > 1, the "generalized diagonal" corresponding to K is is the 
set of indices {i G [nf : i^ = ii for /c, / G K}. 

Constants We will use the letter C to denote absolute constants and Ca for constants depend- 
ing only on some parameter a. In both cases the values of such constants may differ between 
occurrences. 

3 A concentration inequality for non-lipschitz functions 

In this Section we prove Theorem 1.2. Let us first state our main tool, which is an inequality 
by Latala in a decoupled version. 

Theorem 3.1 (Latala, [43]). Let A = (ai)igr„]d he a d-indexed matrix with real entries and let 
Gi, G2, . . . , Gd be i.i.d. standard Gaussian vectors in M". Let Z = {A, Gi® ■ ■ ■ ® Gd)- Then for 
every p > 2, 

Cd' E p*^^"\\^\\j < \\z\\p <CdY: v*"'mj 

Thanks to general decoupling inequalities for [/-statistics [24], which we recall in the Ap- 
pendix (Theorem 7.1), the above theorem is formally equivalent to Theorem 1.1. In fact in [43] 
Latala first proves the above version. In the proof of Theorem 3.3 we will need just Theorem 
3.1 (in particular in this part of the article we do not need any decoupling inequalities). 

From now on we will work in a more general setting than in Theorem 1.2 and assume that 
X is a random vector in M", such that for all p > 2 there exists a constant Lx{p) such that for 
all bounded C^ functions f:W^R, 

\\f{X)-Ef{X)\\p<Lx{p) \Vf{X)\ . (5) 

p 

Clearly in this situation the above inequality generalizes to all C^ functions (if the right-hand 
side is finite then the left-hand side is well defined and the inequality holds). 



Let now G be a standard n-dimensional Gaussian vector, independent of X. Using the 
Fubini theorem together with the fact that for some absolute constant C, all x G M" and p >2, 
C~^^yp\x\ < ||(x,G)||p < C,yp\x\, we can linearise the right-hand side above and write (5) 
equivalently (up to absolute constants) as 

||/(X)-E/(X)||,<^^ (V/(X),G) . (6) 

We remark that similar linearisation has been used by Maurey and Pisier to provide a simple 
proof of the Gaussian concentration inequality [55, 56] (see remark following Theorem 3.3 below). 
Inequality (6) has an advantage over (5) as it allows for iteration leading to the following simple 
proposition. 

Proposition 3.2. Consider p > 2 and let X he an n-dimensional random vector satisfying (5). 
Let / : M" — )■ M 6e a C^ function. Let moreover Gi, . . . , Gd be independent standard Gaussian 
vectors in M", independent of X. Then for all p > 2, ifY) f{X) G L^ , then 

\\f{X) - E/(X)||, <9!ll0)^\\(J^'^f{X\ G,®---®Gd)\\, (7) 

+ E ^'y |KExD'^/(X), Gi ®---<FjGMv 

l<d<D-l ^ 

Proof. Induction on D. For D = 1 the assertion of the proposition coincides with (6), which 
(as already noted) is equivalent to (5). Let us assume that the proposition holds for D — 1. 
Applying thus (7) with D — 1 instead of D, we obtain 

\\f{X) - Ef{X)\\, < ^''^(^^i^r'' lKD^'V(^),gi » • • • ® GD^i)t (8) 

+ E ^'^ii^^' \\{ExB'f{X), Gi ^ • • • G,)\\,. 
d=i P 

Applying now the triangle inequality in L^, we get 

||(D^-V(X), Gi • • • Gd-i)\\p <|KD^-V(X) - ExD^-V(X), Gi • • • ^ Gd-i)\\p 

+ IKExD^-V(X), Gi ® • • • Gd-i)\\p. (9) 

Let us now apply (6) conditionally on Gi, . . . , G/j-i to the function fi{x) = (D^^^f^x), Gi 
■■■(^Gd-i). Since (D^-i/(X)-IExD^-i/(^),G'i®---®Gi,_i) = fi{X) -ExfiiX)) and 
(V/i(X), Gd) = i'D^fiX), Gi • • • Gd), we obtain 

ExKD^-V(^) - ExD^-V(X), Gi ^ • • • ® Gd-iW 

^ ^!!:^M!ex,g,KD^/(X), Gi • • • Gd)\p. 

To finish the proof it is now enough to integrate this inequality with respect to the remaining 
Gaussian vectors and combine the obtained estimate with (8) and (9). D 

Let us now specialize to the case when Lx{p) = Lp'^ for some L > 0,^ > 1/2. Combining 
the above proposition with Latala's Theorem 3.1, we obtain immediately the following theorem, 
a special case of which is Theorem 1.2. 

Theorem 3.3. Assume that X is a random vector in M", such that for some constants L > 
0, 7 > 1/2, all smooth functions f and all p > 2, 

||/(X)-E/(X)||p<LpT \Vf{X)\ . (10) 



For any smooth function f : R" — t- M o/ class C and p > 2 if D /(X) G L^, then 

\\f{X)-Ef{X)\\,<CD( E lV^-V2)i>+#^/2 ||d^/(X)||^ 



JePo 

l<d<D-l J'ePd 
IfD^f is bounded uniformly on R", then for all t > 0, 

P(|/(X) - Ef{X)\ > t) < 2exp ( - ^^/(t)) , 



where 



r]f{t)=mm{A,B), 

._. ff t x2/{(27-l)D+#:7)x 

,, t N2/((27-l)d+#:7) 

B = min min ' ' ' 

l<d< 



D-ijipAyLd\\EB'if{X)\\j) 



Proof The first part is a straightforward combination of Proposition 3.2 and Theorem 3.1. The 
second part follows from the first one by Chebyshev's inequality IP(|^| > e||y||p) < exp(— p) 
applied with p = rjf(t)/Co (note that if r]f{t)/CD < 2 then one can make the tail bound 
asserted in the theorem trivial by adjusting the constants). D 

Remark In [55, 56] Pisier presents a stronger inequality than (10) with 7 = 1/2. More 
specifically, he proves that if X, G are independent standard centred Gaussian vectors in M", E 
is a Banach space and /: M" — )■ i? is a C^ function, then for every convex function <^: i? — )■ R, 



E$(/(X) - E/(X)) < E<1> [L{Vf{X), G) j , (11) 

where L = ^. As noted in [46], CafFarelli's contraction principle [20] implies that, e.g., a 
random vector X with density e~ , where V: M" — )■ M satisfies D^V > Aid, A > satisfies the 
above inequality with L = —^ (where G is still a standard Gaussian vector independent of X). 
Therefore in this situation a similar approach as in the proof of Proposition 3.2 can be used for 
functions / with values in a general Banach space. Moreover, a counterpart of Latala's results is 
known for chaoses with values in a Hilbert space (to the best of our knowledge this observation 
has not been published, in fact it can be quite easily obtained from the version for real valued 
chaoses). Thus in this case we can obtain a counterpart of Theorem 3.3 (with 7 = 1/2) for 
Hilbert space valued- functions. In the case of a general Banach space two-sided estimates for 
Banach space- valued Gaussian chaoses are not known. Still, one can use some known inequalities 
(like hypercontraction or Borell-Arcones-Gine inequality) instead of Theorem 3.1 and thus obtain 
new concentration bounds. We remark that if one uses hypercontraction, one can obtain explicit 
dependence of the constants on the degree of the polynomial, since explicit constants are known 
for hypercontractive estimates of (Banach space-valued) Gaussian chaoses and one can keep 
track of them during the proof. We skip the details. 

In view of Theorem 3.3 a natural question arises: for what measures is the inequality (10) 
satisfied? Before we provide examples, for technical reasons let us recall the definition of the 
length of the gradient of a locally Lipschitz function. For a metric space (Af, d), a locally Lipschitz 
function /: Af — )■ M and x G A', we define 

|V/|(x) = hmsup — ^ . (12) 

d{x,y)^0 "1^5 y) 



If X = M" and / is differentiable at x, then clearly |V/|(x) coincides with the Euclidean 
length of the usual gradient V f{x). For this reason, with slight abuse of notation, we will write 
|V/(x)| instead of |V/|(x). We will consider only measures on M", however since we allow 
measures which are not necessarily absolutely continuous with respect to the Lebesgue measure, 
at some points in the proofs we will work with the above abstract definition. 

Going back to the question of measures satisfying (10), it is well known (see e.g. [50]) that 
if X satisfies the Poincare inequality 

Var {f{X)) < Dporan^f{X)\^ (13) 



for all locally Lipschitz bounded functions, then X satisfies (10) with 7 = 1 and L = C\/Dpoin 
(recall that C always denotes a universal constant). Assume now that X satisfies the logarithmic 
Sobolev inequality 

Ent/2(X) < DLsmf{X)\^ (14) 

for locally Lipschitz bounded functions, where for a nonnegative random variable Y, 

EntY = EY log Y - EY log(Ey). 



Then, by the results from [3], it follows that X satisfies (10) with 7 = 1/2 and L = \J Dj^sl'^- 

We will now generalize this observation to measures satisfying the so-called modified loga- 
rithmic Sobolev inequality (introduced in [27]). We will present it in greater generality than 
needed for proving (10), since we will use it later (in Section 6) to prove refined concentration 
results for random vectors with independent Weibull coordinates. 

Let /3 G (2, 00). We will say that a random vector 1" G M satisfies a /3-modified logarithmic 
Sobolev inequality if for every locally Lipschitz bounded positive function / : M^ — t- M, 

f{Y) 



Ent/^(y) < Dls, ( E| V/(y)|^ + E^jjL)-^ ) . (15) 



Let us also introduce two quantities, measuring the length of the gradient in product spaces. 
Consider a locally Lipschitz function /: M. — )■ M, where we identify R with the m-fold 
Cartesian product of M'^. Let x = {xi, . . . ,Xm), where Xi G M'^. For each i = l,...,m, let 
|Vj/(x)| be the length of the gradient of /, treated as a function of Xi only, with the other 
coordinates fixed. Now for r > 1, set 



iv/(x)i, = (x^iv,/(x)r 



l/r 



i=l 



Note that if / is differentiable at x, then |V/(a:)|2 = |V/(x)| (the Euclidean length of the "true" 
gradient), whereas for k = 1 (and / differentiable), \V f{x)\r is the £™ norm of V/(x). 

Theorem 3.4. Let (3 G [2, 00) and Y be a random vector in M , satisfying (15). Consider a 
random vector X = {Xi, . . . , Xm) in M."^^, where Xi, . . . , Xm are independent copies ofY. Then 
for any locally Lipschitz f: M. — )■ R such that f{X) is integrahle, and p > 2, 



)^/2 „l/2 



/9 



\\f{X)-Ef{X)\\,<CpD'^^y \Vf{X)\, +DYIp"^ \Vf{X)\p , (16) 



)l//9 „1/q 



p -li 



where a = -n^ is the Holder conjugate of (3. 

In particular using the above theorem with m = 1 and k = n, we obtain the following 

Corollary 3.5. If X is a random vector in M" which satisfies the ^-modified log-Sobolev in- 
equality (15), then it satisfies (10) with 7 = l—^- > ^ and L = C^max(D^^ ^^ls )• 
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We remark that in the class of logarithmically concave random vectors, the /3-modified log- 
Sobolev inequality is known to be equivalent to concentration for 1-Lipschitz functions of the 
form P(|/(X) - Ef{X)\ > t) < 2exp(-ct'^/(^-i)) [51]. 

Proof of Theorem 3.4- By the tensorization property of entropy (see e.g. [45], Proposition 5.6) 
we get for all positive locally Lipschitz bounded functions / : R™^ — t- M, 



Entf\X) < Dls, (e| V/(X)|i + j; E '^l^yi ) . (17) 

Following [3], consider now any locally Lipschitz bounded / > and denote F{t) = E/(X)*. 
For t > 2, 

F'{t)=E{f{X)Hogf{X)) 

and 



^ 'EfiXyf = Pitf/' = F{tfl' . I 01ogF(t) 



dt 



= F{tf/' 0^ - |logF(t)) = ^m'-' (tF'it) - Fit)logFit)) 

= I {Efixyy-' (E {f (xY log fixY) - (EfixY) log (e/(x)*)) . 

By (17) applied to the function g = /*'^ = (p o f where (p{u) = |n|*' ^, 
4 {EfiXYf < I (EfiXYy-' ■ Dls, (miv o f)iX)\l + E|V(<^ o /)(X)|^/(X)*(2~/3)/2 



By the chain rule and the Holder inequality for the pair of conjugate exponents t/2,t/(t — 2), 
E\V{^of){X)\l = E{\^'{f{X))\-\Vf{X)\^f 

< {E\Vf{X)\\f" (e {^'{f{X))f"^'-^^^'''~^^" 



llv/(x)|2||^ (^) (E/(x)*) 



Similarly, for t > /?, 

2/3 
.E/(X)*-^|V/(X)| 



E|V(^o/)(X)|g/(X)*(2-/3)/2 = ^E/(X)(*/2-i)/3|V/(X)|g/(X)*(2-/3)/2 



2/3 
< ^(E/(X)*)i-/^/*(E|V/(X)|*,)/^/* 

= ^(E/(X)*)i-^/*|||V/(X)|;3||f. 



Thus we get for /3 < t < p, 

|(E/(X)*)^/*<i^|||V/(X)|2||; + ^t'^-^(E/(X)*)^-/^)/*|||V/(X)|,| 



^ (i^flv\t\'^/* ^ ^^^f>\\\^f(v\\A\'^ , ^^■^//3-2/'ipf/^At\(2-/3)/t| 



Denote a = ^|||V/(X)|2||p, 6 = ^|||V/(X)|^||J, ^(t) = {Ef{XYf". The above inequal- 
ity can be written as 

11 



for t € [13, p] or, denoting G = g 



,/9/2 



tit/ ^ 



For £ > consider now the function Hs{t) = {g{/3) + a{t - (3) + 62//3^2-2//3 _^ £)/3/2^ ^g j^g^^g 

//,(/?) > G(/3) 
and 

where we used the assumption /3 > 2. Using the last three inequahties together with the fact that 
for f > the function x i— )■ x^^^^-* "a + t"~'^h is increasing on [0, oo) we obtain that G{t) < Hg{t) 
for ah t G [/3,p], which by taking e — t- 0+ imphes that for p > f3, 

n D^l^ 

g{p) = G{pf/^ < H,{pfl^ < g{f3) + -^{p - /3)|||V/(X)b||J + 4^P^-^/^|||V/(X)|,||J, 

i.e., 

||/(X)g<||/(X)||2+-|^(p-/3)|||V/(X)|2||J + -^/-2/'^|||V/(X)|^||J. (18) 

The above inequahty has been proved so far for strictly positive, locally Lipschitz functions 
(the boundedness assumption can be easily removed by truncation and passage to the limit). 
For the case of a general locally Lipschitz function /, take any e > and consider / = |/| + e. 
Since / is strictly positive and locally Lipschitz, the above inequality holds also for /. Taking 
e — 7- 0"'", we can now extend (18) to arbitrary locally Lipschitz /. 

Finally, assume /: W^^ — )■ M is locally Lipschitz and f{X) is integrable. Applying (18) to 
/ — E/(X) instead of / and taking the square root, we obtain 

\\f{X)-^f{X)%<\\f{X)-^f{X)\\p + ^D^J^{^^)\\\Vf{X)\^\^^ 

for p > 13. For p G [2,/3], since (15) implies the Poincare inequality with constant Dl5„/2 (see 
Proposition 2.3. in [27]), we get 



\\f{X) - E/(X)||p < CDl^lp\\\Vf{X)\ 



p 



(see the remark following (13)). These two estimates yield (16) with Cjs = Gy/]3. D 

3.1 Applications of Theorem 1.2 

Let us now present certain applications of estimates established in the previous section. For 
simplicity we will restrict to the basic setting presented in Theorem 1.2. 

3.1.1 Polynomials 

A typical application of Theorem 1.2 would be to obtain tail inequalities for multivariate poly- 
nomials in the random vector X. The constants involved in such estimates do not depend on the 
dimension, but only on the degree of the polynomial. As already mentioned in the introduction, 
our results in this setting can be considered a transference of inequalities by Latala from the 
tetrahedral Gaussian case to the case of non-necessarily product random vectors and general 
polynomials. 
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3.1.2 Additive functionals and related statistics 

We will now consider three related classes of additive statistics of a random vector, often arising 
in various problems. 

Additive functionals Let X be a random vector in M" satisfying (3). For a function /: M — t- 
M define the random variable 

Zf = f{X,) + ... + f{Xn). (19) 

It is classical and follows from (3) by a simple application of the Chebyshev inequality that 
if / is smooth with ||/'||oo ^ ct, then for all t > 0, 

F{\Zj - EZj\ >t)<e' exp ( - -^-^) . (20) 

Using Theorem 1.2 we can easily obtain inequalities which hold if / is a polynomial- like func- 
tion, i.e., if ll/'-^lloo < CO for some D. Note that the derivatives of the function F{xi, . . . ,x„) = 
f{xi) + . . . + f{xn) have a very simple diagonal form. In consequence, calculating their || • || j- 
norms is simple. More precisely, we have 

D'^F(x) = diag,(/W(xi),...,/W(x„)), 

where diag(^(xi, . . . ,x„) stands for the d-indexed matrix (ai)igr„id such that a\ = Xi \i ii = 
. . . = id = i and otherwise. It is easy to see that if ^ = {[d]}, then ||diag^(xi, . . . , x„)||j' = 
\/x1 + . . . + x^ and if ^J > 2, then ||diag(^(xi, . . . ,x„)||j' = maxj<„ |xj|. Therefore we obtain 
the following corollary to Theorem 1.2. We will apply it in the next section to linear eigenvalue 
statistics of random matrices. 

Corollary 3.6. Let X be a random vector in M" satisfying (3), /: M — t- M a C function, such 
that Wf^^'Woo < CO and Zf is defined by (19). Then for all t > 0, 

/ 1 / i2 ^2/D 

¥{\Zf - EZf\ > t) < 2exp ( - — - min 



2/D 



+ 2 exp I — — — min 



1 . / t^ 



Cd i<d<D-i Vl2'^^^^,(E/W(X,)) 

1 . / t^/d 

+ 2 exp ( — — — mm 



Cd 2<d<D-i \L^ma^i<n |E/W(Xi)|2/'^ 

Clearly the case D = 1 oi the above corollary recovers up to constants (20). Moreover 
using the (yet unproven) Theorem 1.3 one can see that for /(x) = x and X being a standard 
Gaussian vector in M", the estimate of the corollary is optimal up to absolute constants (in this 
case, since Zf is a sum of independent random variables, one can also use estimates from [32]). 

Additive functionals of partial sums Let us now consider a slightly more involved additive 
functional of the form 

n i 

Sf = Ef{E^^)- (21) 

Such random variables arise e.g., in the study of additive functionals of random walks (see 
e.g. [59, 16]). For simplicity we will only discuss what can be obtained directly for Lipschitz 
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functions / and what Theorem 1.2 gives for / with bounded second derivative. Let thus F{x) 
Er=i/(E:=i^.)- We have £-F{x) = E^>. /' (E,</^.)- Therefore 



1=1 

which, when combined with (3) and Chebyshev's inequahty yields 

F(lS,-ES,l>t)<2e.p{- ^.^.J 



Now, let us assume that f £ C and /" is bounded. We have 



Moreover 



and thus 



/t /It' i \ ; 

j=i ^ i=i 1=1 ^ 



a2 ' 

F(a:i,...,a;„) = ^ f'i^^k 



dxjdx 



* -^ «>iVi fc=l 



71 71 / 71 71 

iiD'^wiiii,2} = E ( E r(E^'^))'^2|iriiLEE(--j' + i)'^^-viiL. 

ij = l l=i\Jj k=l i=l j=i 

Since D^F is a symmetric bilinear form, we have 

n n I 



iD'i^wii{iH2} < sup 5] 5] |r(E^'^ 

l"l<l jj=i;=ivj fc=l 



ajCtj 



< sup iiriiooE(E«0^ ^^p ii/"iiooE^E«'^^"'ii/"i 
i"i^i /=i i</ i"i^i 1=1 i<i 



Using the above estimates and Theorem 1.2 we obtain 

1 / t2 



Sf - E5/I > t) < 2exp - — ^ min - 

^ ^^" ^Er=i(Er=.wE-=i^.)) 



,2' ^211 PI 



To effectively bound the sub-Gaussian coefficient in the above inequality one should use some 
additional information about the structure of the vector X. For a given function / it is of order 
at most n^, but if, e.g., the function / is even and X is symmetric, it clearly vanishes. In this 
case we get 

P(|5,-ES,|>t)<2„p(-^-^ 

One can check that if for instance X is a standard Gaussian vector in M" and /(x) = x"^ then 
this estimate is tight up to the value of the constant C . 

f^-statistics Our last application in this section will concern [/-statistics (for simplicity of 
order 2) of the random vector X, i.e., random variables of the form 
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where hij : R^ — )■ M are smooth functions. Without loss of generaUty let us assume that 
hij{x,y) = hji{y,x). 

A simple application of Chebyshev's inequality and (3) gives that if D/ij,j are uniformly 
bounded on M? then for all t > 0, 



F{\U -EU\ >t) <2exp 
< 2exp 



1 i2 






For hij of class C^ with bounded derivatives of second order, a direct application of Theorem 
1.2 gives 



P(|?7-E;7| > t) < 2exp - — min 



1 . / t^ t^ ^ 



C KL^a'^' L'^P^' L'^jJJ' 



where 



„2 









<n max ||— — -— /ijo 
iT^j Woxoy 



d' . 



«7 
n 



+ n max 



9a;2 ^-^ 1 1 - - ' 



loo 



^' = E (E^^^^^(^-^^))' ^ ^'max|EA/,^^.(x„X,)p, 

i = l jryj 

7 = sup sup I Yl l^ZjrS^^'' Xj)aif3j + ^ a^A J]] tt;^^^ (^i' ^i) f 



< n( max 



92 



/ijj +max 
oxoy 1 1 oo i^^i 



2 






In particular, if hij = h, a function with bounded derivatives of second order, we get a^ = 
0{n^), fi"^ = O(n^), 7 = 0(n), which shows that the oscillations of U are of order at most 
0{v?''^). In the case of [/-statistics of independent random variables, generated by bounded h, 
this is a well known fact, corresponding to the CLT and classical Hoeffding inequalities for U- 
statistics. We remark that in the so called non-degenerate case, i.e. when Var (^xh{X, Y)) > 0, 
n^'2 ig then indeed the right normalization in the CLT for [/-statistics (see e.g. [23]). 

3.1.3 Linear statistics of eigenvalues of random matrices 

We will now use Corollary 3.6 to obtain tail inequalities for linear eigenvalue statistics of random 
Wigner matrices. We remark that one could also apply to the random matrix case the other 
inequalities considered in the previous section, obtaining in particular estimates on [/-statistics 
of eigenvalues (which have been recently investigated by Lytova and Pastur [48]). We will focus 
on linear eigenvalues statistics (additive functionals in the language of the previous section) and 
obtain inequalities involving as a sub-Gaussian term a Sobolev norm of the function / with 
respect to the semicircle law (the limiting spectral distribution for Wigner ensembles). We refer 
the reader to the monographs [4, 6, 49, 54] for basic facts concerning random matrices. 

Consider thus a real symmetric n x n random matrix A (n > 2) and let Ai < . . . < A„ be its 
eigenvalues. We will be interested in concentration inequalities for functionals of the form 



z = Yf{x,/V^). 



i=l 
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In [30] Guionnet and Zeitouni obtained concentration inequalities for Z with Lipschitz / as- 
suming that the entries of A are independent and satisfy the log-Sobolev inequality with some 
constant L. More specifically, they prove that for all t > 0, 

P(|Z-EZ|>*)<2e,p(-^jj^ 

(In fact they treat a more general case of banded matrices, but for simplicity we will focus on 
the basic case.) 

As a corollary to Theorem 1.2 we present below an inequality which compliments the above 
result. Our aim is to replace the strong parameter ||/'||oo controlling the sub-Gaussian tail by a 
weaker Sobolev norm with respect to the semicircular law 

dp{x) = --\/4 - x^l/ 2,2)(a;) rfa;. 

ZTT 

(recall that this is the limiting spectral distribution for Wigner matrices). Imposing additional 
smoothness assumptions on the function / it can be done in a window \t\ < cjn, where cj 
depends on /. 

Proposition 3.7. Assume the entries of the matrix A are independent (modulo symmetry con- 
ditions), mean zero and variance one random variables, satisfying the logarithmic Sobolev in- 
equality (14) with constant L^. If f is C^ with bounded second derivative, then for all t > 0, 

P(|Z-EZ| > t) < 2exp I --^ ( ^ A „ "^, | | . (22) 

Remark The case f{x) = x^ shows that under the assumptions of Proposition 3.7 one cannot 
expect a tail behaviour better than exponential for large t. Indeed, since Z = -(Af + . . . + 
-^n) ~ n Si jKn^lj^ even if ^ is a matrix with standard Gaussian entries, then for all t > 0, 
P(|Z - EZ| >~t) > ^ exp(-C(t2 A nt)). 

Remark A similar inequality to (22) holds in the case of Hermitian matrices with independent 
entries as well. In the proof given below one should invoke an appropriate result concerning the 
speed of convergence of the spectral distribution of Wigner matrices to the semicircular law. 

Proof. Let us identify the random matrix A with a random vector A = (^jj)i<j<j<ra having 

values in M^'.^+i)/^ endowed with the standard Euclidean norm \A\ = I ^x<j<,<„ A^j I • Note 

that II^IIhs ^ v2|^|- By independence of coordinates of A and the tensorization property 
of the logarithmic Sobolev inequality (see, e.g., [45, Corollary 5.7]), A also satisfies (14) with 
constant I? . Furthermore, by the Hoffman- Wielandt inequality (see, e.g., [4, Lemma 2.1.19]) 
which asserts that if B, C are two n x n real symmetric (or Hermitian) matrices and Xi{B), Xi{C) 
resp. their eigenvalues arranged in nondecreasing order, then 



Y,\HB)-\iC)\'<\\B-C\gs, 



i=l 



the map A i— )■ (Ai/-^/n, . . . , A„/y^) G M" is Y^2/n-Lipschitz. Therefore, the random vector 
{Xl/^/n, . . . , Xfi/^/n) satisfies (14) with constant 11? jn. In consequence, by the results from [3] 
(see also Theorem 3.4), {X\l ^fn, . . . ^Xnj \fn) also satisfies (3) with constant Lj^fn. Applying 
Corollary 3.6 with D = 2 we obtain 



t^ nt 



|Z-EZ|>t)<2exp -— ^ ^_„ _^_ _ _^_ ^^ ^_„„, A 



CL^ V"-'Er=i(lE/'(A./VH))2 + L2n-i||/"||^ \\f"\L)) ' 

(23) 
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In what follows we shall estimate from above the term n ^ X]r=i(^/'(^«/V^))^ from (23). 
First, by Jensen's inequality 



n ^-^ \ n ^-^ 



{frdfi, 



(24) 



where fi is the expected spectral measure of the matrix n ^''^A. According to Wigner's theorem, 
for a fixed /, fi converges to the semicircular law as n — )■ oo and thus f^if')'^ dfj, — )■ /_2(/')^ dp. 
A non-asymptotic bound on the term J^ f^ dfi can be obtained using the result of Bobkov, Gotze 
and Tikhomirov [12] on the speed of convergence of the expected spectral distribution of real 
Wigner matrices to the semicircular law. Since each entry of A satisfies the logarithmic Sobolev 
inequality with constant L^, it also satisfies the Poincare inequality with the same constant (see 
e.g. [45, Chapter 5]). Therefore Theorem 1.1 from [12] gives 



sup \F^{x) - Fp{x)\ < Clu 



-2/3 



(25) 



xeif 



where F^ and Fp are the distribution functions of p and p, respectively. 

The decay of 1 — Fp{x) and Fp{x) as x — )■ oo and x — )■ — oo (resp.) can be obtained using 
the sub-Gaussian concentration of \nl \pn and Xi/^/n^ which is, e.g., a consequence of (3) for 
the vector of eigenvalues of n^^''^A. For example, for any t > 0, 



^" >E^+t) <2exp 
n \/n 



1 nt^ 



(26) 



Using the classical technique of (5-nets for estimating the operator norm of a matrix (see e.g. [56]) 
and the fact that the entries of A are sub-Gaussian (as they satisfy the logarithmic Sobolev 
inequality) one gets EA„ < E||^||op < CL^/n, which together with (26) yields 



l-F^{CL + t)< 



Xr. 



n 



>CL + t] < 2 exp 



1 nr 

cU 



(27) 



for all t > 0. Clearly, the same inequality holds for F{—CL — t). Integrating by parts, 

r dp= f r dp+ f ifixf)' (Fpix) - F^ix)) dx. (28) 

JR JR 

Combining the uniform estimate (25) with (27) and using an elementary inequality 2xy < x'^+y'^, 
we estimate the last integral in (28) as follows: 



{f'{xf)'{F^{x)-Fp{x))dx 



< / |2/(x)/"(x)| ||F^-F,||^A2exp 



ndist(x, [-CL,CL])^ 



C 



L2 



dx 



< / f'{xfdu{x) + u{R)\\fr, (29) 



where 



diy{x) = Clu ^/^ A 2 exp 



dist(z, [-CL,CL])=^ 



dx, and a 



CL^ 



2cr2 J ""' '" ' 2n 

We proceed to estimate the two last terms from (29). Take r > such that 

2e-rV(2-=^) = Cin-2/3 (30) 
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or put r = if no such r exists. Note that if we assume Cl > 1, as we obviously can, then 

r < CLn-i/Vlogn. (31) 

We shah need the following estimates, which are easy consequences of the standard estimate for 
a Gaussian tail: 

/•oo 

/ e-v'li^'^') dy < Cae-'-'/(2-^) < CLan-^/^ < C^n-'^l^ (32) 

J r 

and 



OO / I'OO 

.2, 



1/2 / /.OO \ 1/2 



< Cia5/2(an-2/3)V2 < Cin-"/6. 
Now, (30), (31) and (32) yield 

/•OO 

z/(IR) < (CL + r)CLn-'^l^ + 4 / g-^'/^^a^) ^^ ^ Cin-^/^. (34) 

We shall also need the estimate for j^x^ dv(x) which follows from (30), (31) and (33): 

/* 9 roo 

/ x^ dv{x) = -{CL + r)3C7in-2/3 + 4 / (CL^ yfe-^'/^^'"''^ dy < dn-^/^. (35) 

In order to estimate J^ f^ dv, take any xq £ [—2, 2] such that |/'(xo)P < /_2 /'^ f^P, and use 
|/'(x)| < |/'(xo)| + \x- xo\ lir 11^ to obtain 

2 



(36) 



f'{xf du{x) <2( /'2 dp)u{R) + 2 lini^ / \x - xoP dzv(x) 

^J-2 ^ JR 

<2( / /2dp)z.(lR) + 4||r||^x2i.(M) + 4||r||^ f xUuix). 
Plugging (34) and (35) into the above yields 

jj'{xfdu{x) < Clu-'/^ (//"rfp+ liniL) • 
In turn, plugging (34) and (36) into (29) and then combining with (28) we finally get 

/ r rfp < (1 + C^n-2/3) f /2 ^^ + CLn-2/3 ||^.||^ 

JR JR 

which combined with (23) and (24) completes the proof. D 

Remark With some more work (using truncations or working directly on moments) one can 
extend the above proposition to the case, when |/"(x)| < a(l + |x|'^) for some non-negative 
integer k and a G M. In this case we obtain 

II t^ n / 1\ ^+2 

P(|Z-EZ|>t)<2exp - — -^— — — -—A 



Cl £2 /" dp + CL,kn-'/'a^ ^L^k \a 

We also remark that to obtain the inequality (23) one does not have to use independence of the 
entries of A, it is enough to assume that the vector A satisfies the inequality (3). 
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4 Two-sided estimates of moments for Gaussian polynomials 

We will now prove Theorem 1.3, showing that in the case of general polynomials in Gaussian 
variables, the estimates of Theorem 1.2 are optimal (up to constants depending only on the 
degree of the polynomial). In the special case of tetrahedral polynomials this follows from 
Latala's Theorem 1.1 and the following result by Kwapieh. 

Theorem 4.1 (Kwapieh, [39]). If X = (Xi, . . . ,Xn) where Xi are independent symmetric 
random variables, Q is a multivariate tetrahedral polynomial of degree D with coefficients in a 
Banach space E and Qd is its homogeneous part of degree d, then for any symmetric convex 
function <I> : £' — )■ M+ and any d € {0, 1, . . . , Z?}, 

E$(Qrf(X))<E$(CrfQ(X)). 

Indeed, when combined with Theorem 1.1 and the triangle inequality, the above theorem 
gives the following 



Corollary 4.2. Let 

Z = 

0<d<Di£ 



^= ^ Yl "'i'^^ 9h ■ ■ ■ 9id, 



where A^ = (cj )igr„id is a d-indexed symmetric matrix of real numbers such that ai = if 

ij = ii for some k ^ I (we adopt the convention that for d = we have a single number ai ). 
Then for any p > 2, 

Co' E E p*''^'\\M\j <\\z\\,<co y: E p*''^'\\M\j- 

0<d<D JePd 0<d<DJ<=Pd 

The strategy of proof of Theorem 1.3 is very simple and relies on infinite divisibility of 
Gaussian random vectors, which will help us approximate the law of a general polynomial in 
Gaussian variables by the law of a tetrahedral polynomial, for which we will use Corollary 4.2. 

It will be convenient to have the polynomial / represented as a combination of multivariate 
Hermite polynomials: 

D 

f{xi,...,Xn) = Yj E ^dhdAxi)---hdA^n), (37) 

where 



d=0 deA^J 



Ad = {d = {di,...,dn): Vfcg[„] 4 > and di -\ \- dn = d} 

and hd{x) = (— l)'^e^ '^^~^ ^^ ^^^ "^"^^ Hermite polynomial. 

Let (I^t)t6fo,il be a standard Brownian motion. Consider standard Gaussian random vari- 
ables g = Wi and, for any positive integer N , 



9j,N 



VN{W^-Wi^), j = l,...,N. 



For any d > 0, we have the following representation of hd{g) = hd{Wi) as a multiple stochastic 
integral (see [33, Example 7.12 and Theorem 3.21]), 



hd{g) = d\ I ['■■■[' dWt,--- dWt,_,dWt,. 
Jo Jo Jo 
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Approximating the multiple stochastic integral leads to 

hd{g) = dl lim N"'^/^ V gj^^N ■ ■ ■ 9u,N 

i<ii<-<id<A^ 

(38) 
= lim N '^/^ Y^ gn,N---9u,N, 

where the limit is in L^(il) (see [33, Theorem 7.3. and formula (7.9)]) and actually the conver- 
gence holds in any L^ (see [33, Theorem 3.50]). We remark that instead of multiple stochastic 
integrals with respect to the Wiener process we could use the CLT for canonical [/-statistics 
(see [23, Chapter 4.2]), however the stochastic integral framework seems more convenient as it 
allows to put all the auxiliary variables on the same probability space. 

Now, consider n independent copies (l^j )tGfo,il of the Brownian motion [i = l,...,n) 
together with the corresponding Gaussian random variables: g^^' = VF^ and, for A^ > 1, 



g^ = VN(wf-W^, 3 = 1,. ..,N. 



N 



In the lemma below we state the representation of a multivariate Hermite polynomial in the 
variables g^^' , . . . ,g^"'' as a limit of tetrahedral polynomials in the variables gjj^- To this end 
introduce some more notation. Let 

ri{n,N) _ ( (1) (1) (2) (2) (n) (n) . _ , (i) ^ 

^ — ydi^N--- ■ ■ ^9n,n^ 9i,n^- ■ ■ ^9n,n^ •••' 9i^N^- ■ ■ ^9n,n) — \9j,N)iid)eln]x[N] 

be a Gaussian vector with n x N coordinates. We identify here the set [nN] with [n] x [N] via 
the bijection {i,j) o (i — 1)A^ + j. We will also identify the sets ([n] x [N]y and [n]'^ x [N]'^ in 
a natural way. For d > and d € A^, let 

/d = {iG[n]'^:V,e[„]#i-i({/}) = d,}, 

and define a d- indexed matrix B^ of n'^ blocks each of size A^'^ as follows: for i G [riY and 

J G [A^]^ 

(^) _ f ^^^t^N-''/' if i G /d and (i, j) := {{h,n), ..., (i,,i,)) G {[n] x [N]f, 

I otherwise. 

Lemma 4.3. With the above notation, for any p > 0, 

(i?f\(G("'^))«'^> -^ h,,{g('y) • • • h,„{g(-y) ^nL^{n). 

Proof. Using (38) for each /^^. (^W), 
h,A9^'y) ■ ■ ■ h^^ig^--)) 






(n) 



For each A^, the right-hand side equals 

'^ ie/d je[Af]'«s.t. 

(ij)ei[n]xlN])^ 

since #/d = dir^,- □ 
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>w 



Note that B^ is symmetric, i.e., for any i G [n] , j € [N] if vr: [d] — )■ [d] is a permutation 
and i' G [n]"', j' G [iV]'^ are such that ^keld] i'k = V(fe) and j^ = j^(fc), then 



?W^ 



,W^ 






,W 



?W^ 



Moreover, i^^ has zeros on "generahzed diagonals", i.e., (i?^ ),. .s = if {ik,jk) = {k,jl) for 
some k ^ I. 

Proof of Theorem 1.3. Let us first note that it is enough to prove the moment estimates, the 
tail bound follows from them by the Paley-Zygmund inequality (see e.g. the proof of Corollary 
1 in [43]). Moreover, the upper bound on moments follows directly from Theorem 1.2. For the 
lower bound we use Lemma 4.3 to approximate the L^ norm of f{G) — E/(G) with that of a 
tetrahedral polynomial, for which we can use the lower bound from Corollary 4.2. 

Assuming / is of the form (37), Lemma 4.3 together with the triangle inequality implies 



D 



N-^oo 



JL- E Eo^^'.ic 



(n,N)) 



d=l deA" 



|/(G)-E/(G)| 



for any p > 0, where G = {g^^', ■ ■ ■ ,g^^'). It therefore remains to relate || ^d'^d^d || -r with 
ED /(G) „ for any d> 1 and J ^ Pd- In fact we shall prove that 



lim y o-dB 



Af->oo 



(N) 

d 



deA" 



J dl 



ED"'/(G) 



J 



(39) 



which will end the proof. 



Fix d > 1 and J G Pd- For any d G A^ define a symmetric d-indexed matrix (&d)igrraid as 



(&d)i 



^^t^ ifiG/d, 



d\ 



I otherwise. 

and a symmetric d-indexed matrix {B^ )(i,j)e{[n]x[Ar])'* as 

(^^^^iJ) = N-''/^{bd)i for ah i G [n]^ and j G [N]". 
It is a simple observation that 



deA" 



(N) 
d 



J 



E 

dGA" 



Od(0d)ie[n]' 



J 



(40) 



On the other hand, for any d G A^, the matrices B^ and B^ differ at no more than 
i^Id • i^{[N] \ [N]-) entries. More precisely, if J^o = {[d]} (a trivial partition of [d] into one set), 
then 



\B 



(N) r(7V)||2 



B^ 'W < \\B 

-°d \\j - IPd 



(N) 



B 



(iv)||2 < ^M:_^^-d(^d _ ^d) 



^d ^d \\J - ll^d ^d IIJo ^ (II 

Thus the triangle inequality for the || • Hj- norm together with (40) yields 



as TV — > oo. 



lim > OdB 



iV^oo 



(N) 
d 



deA:; 



J 



yZ «d(fed)ie[n]' 



deA:; 



J 



Finally, note that 



ED'^/(G) = d! Y^ ad(MieM^- 

deA" 



(41) 



(42) 
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Indeed, using the identity on Hermite polynomials, h!^{x) = khk-i{x) {k > 1), we obtain 
¥,h\. \g) = k\6k,i for k,l > 0, where /' -* stands for the l-th derivative of /, and thus, for any 



dGAS, 



{EB^hdAg^'^) ■ ■ ■ /id„(ff(")))i = d\{bd)i for each i G 



n 



d 



Now, (42) follows by linearity. Combining it with (41) proves (39). D 

Remark Note that the above infinite-divisibility argument can be also used to prove the upper 
bound on moments in Theorem 1.3 (giving a proof independent of the one relying on Theorem 
1.2). 

5 Polynomials in independent sub-Gaussian random variables 

In this section we prove Theorem 1.4. Before we proceed with the core of the proof we will need 
to introduce some auxiliary inequalities for the norms || • \\j as well as some additional notation. 

5.1 Properties of || ■ \\j norms 

The first inequality we will need is pretty standard and given in the following lemma (it is a 
direct consequence of the definition of the norms || • || j-). 

Lemma 5.1. For any d-indexed matrix A = (ai)igr„id and any vectors vi, . . . ,Vd € K" we have 
for all J GPd, 



\Ao^f^^Vi\\j< \\A\\jYI 



d 

Vi\\oo 



1=1 

To formulate subsequent inequalities we need some auxiliary notation concerning d-indexed 
matrices. We will treat matrices as functions from [n]'^ into the real line, which in particular 
allows us to use the notation of indicator functions and for a set C C {1, . . . ,n} write Ic for 
the matrix (oi) such that aj = 1 if i € C and otherwise. 

Note that for ^J > 1, || • Hj- is not unconditional in the standard basis, i.e., in general it 
is not true that ||ylol(^||j'< HAHj-. One situation in which this inequality holds is when C is 
of the form C = {i: ik^ = j'l, . . . , ifc, = j/} for some 1 < ki < . . . < ki < d and ji, . . . ,ji G [n] 
(which follows from Lemma 5.1). This corresponds to setting to zero all coefficients which are 
outside a "generalized row" of a matrix and leaving the coefficients in this row intact. 

Later we will need another inequality of this type, which will allow us to select a "generalized 
diagonal" of a matrix. The corresponding estimate is given in the following 

Lemma 5.2. Let A = (ai)igr„id be a d-indexed matrix and let C C [uY he of the form C = 
{i: ik = ii for k, I G K}, with K CI [d]. Then for every J G P^, \\A o IcHj- < ||^||j'- 

Proof. Since lcinC2 = Ici ° lc2 5 it is enough to consider the case #K = 2, i.e. C = {i: ik = ii} 
for some 1 < k < I < d. Let J' = {Ji, . . . , Jm}- We will consider two cases. 

1. The numbers k and / are separated by the partition J7. Without loss of generality we can 
assume that A; G Ji, / G J2. Then 

WAoIcWj (43) 
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For any x\ , . . . , x^. , consider the matrix 

■IJ3 '■Jm 



-^iji.ija i 2^ -•-IJ3 -ij™/,, 1 



(3) (m) 

aiX; • • • X: 



|l(.7lUJ2)':|<" 



n 



J 2-^ 



acting from (.2 ( [n] "^^ ) to ^2 ( [ 

For fixed x\ , . . . , x- the inner expression on the right hand side of (43) is the operator 
:m of the block-diagonal matrix obtained from B{j ^{j by setting to zero entries in off-diagonal 

blocks. Therefore it is not greater than the operator norm of B\j i^ , which allows us to write 

(3) (m)\ (1) (2)' 

LiX- • • • X; X; X; 

I./3 'Jm/ 'Jl 'J2, 



■'3 

norm of the block-diagonal matrix obtained from B^j ^{j by setting to zero entries in off-diagonal 

■m of -Bi/^^ij^, wl 

\ A -i w ^ ( sr^ sr^ ( sr^ (3) MA (i) (2) 

|Aol(^||j'< sup I sup 2^, y, \ y, aiX; . • • • X; . )X; . X; 



= U\\j. 

2. There exists j such that fc, / G Jj. Without loss of generality we can assume that j = 1. We 
have 

\\A°'^c\\j= J-^V ( sup ^ l{ifc=ij( X] aixfj^---xr^\xr 

(2) ^(m)\2\l/2 

Ui\J - - - * 

«IVl|2<l:i>2 ^|ijj<n ^\\A<n 

J j J- 1 



l!^!;'!|2<l:i>2 bP/||2<l|i,,J<„ •|ijc|<„ 



2 1 Jm / 1 Jl 



< sup (E(E<-<:)T''=Mii.. 

ll^l;M|2<l:i>2 |ijj<„ |i^c|<„ 

D 
For a partition /C = {-f^i, . . . , -fiTm} G -Pd define 

L(/C) = {i G [n]'^: 4 = i/ iff 3,<„ k,l £ K^}. (44) 

Thus i(/C) is the set of all indices for which the partition into level sets is equal to /C. 
Corollary 5.3. For any J^,IC G Pd and any d-indexed matrix A, 

Proof. By Lemma 5.2 and the triangle inequality for any k < I, \\A o Irj^^j i Hj- = ||A — ^4 o 
Ijj^^j 1 IIj- < 2||yl||j'. Now it is enough to note that L(IC) can be expressed as an intersection of 
#/C "generalized diagonals" and ^IC{#IC — l)/2 sets of the form {i: i/. ^ ii} where k < I and 
use again Lemma 5.2 together with the above inequality. D 

5.2 Proof of Theorem 1.4 

Let us first note that the tail bound of Theorem 1.4 follows from the moment estimate and 
Chebyshev inequality in the same way as in Theorems 1.2 or 3.3. We will therefore focus on the 
moment bound. 

The method of proof will rely on the reduction to the Gaussian case via decoupling inequal- 
ities, symmetrization and the contraction principle. To carry out this strategy we will need the 
following representation of /. 

d 






X^ X^ r^^K . . _ .^^l-r^^...^^^ 



/(^)= >. >. >. >.c\,. .,^..^.<x,^---<r> (45) 






(ilM),-,iim,kmril^i2 ^im 



0<d<Dm=0 fei,...,fem>0 ie[n]l 
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where the coefficients c)- , -. i, u \ satisfy 

c^'^) . = c^'^^ . (46) 

for all permutations vr: [m] — )■ [rn\. At this point we would like to explain the convention 
regarding indices which we will use throughout this section. It is rather standard, but we prefer 
to draw the Reader's attention to it, as we will use it extensively in what follows. Namely, we 
will treat the sequence k = (/ci, . . . , km) as a function acting on [m] and taking values in positive 
integers. In particular if tti = 0, then [m] = and there exists exactly one function k: [m] — t- 
N \ {0} (the empty function). Moreover by convention this function satisfies 'Yl^i ki = (as 
the summation runs over an empty set). Therefore, for d = and m = the subsum over 
ki, . . . , km and i above is equal to the free coefficient of the polynomial (which can be denoted 
by ci ) , since the summation over ki, . . . , km runs over a one-element set containing the empty 
index/function and for this index there is exactly one index i: [m] — t- {1, . . . ,n}, which belongs 
to [n]— (again the empty-index). Here we also use the convention that a product over an empty 
set is equal to one. On the other hand, for d > 0, the contribution from m = is equal to zero 
(as the empty index k does not satisfy the constraint fci + . . . + km = d and so the summation 
over ki, . . . ,km runs over the empty set). 

Using (45) together with independence of Xi, . . . , Xn, one may write 

nx)-Enx)= Y. Y. E E 4?*, „„.„, E n(4-'^4)n':4- 

l<d<Dm=l fci,...,fcm>0 ie[n]Si. 07^JC[m]ieJ j^J 

ki + ... + km = d 

Rearranging the terms and using (46) together with the triangle inequality, we obtain 

\f{x) - Ef{x)\ < E E E IE C^if^^c^^ - ^K') ■ ■ ■ (^t - E^t ) 



l<d<Da=l fei,...,fca>0 ie[n]<i 

ki+... + ka — d 



where 



n,--;ta / J / J / > \a I (n,«;i)v.(«m,fcm) «a+l «m 

fel+... + fcm<D (ii,...,im)e[n]— 

Note that (46) implies that for every permutation tt: [a] — )■ [a], 

,(fci,. fe.) ^ (fc.„. /c.J^ (47) 

Let now X^^\...,X^ "i be independent copies of the random vector X and {^f )i<n,j<D an 
array of i.i.d. Rademacher variables independent of {X^^')j. For each fci, . . . , ka, by decoupling 
inequalities (Theorem 7.1 in the Appendix) applied to the functions 

f^::i^{x^, ...,x,) = 4'::: •^^(x^ - exj ) • • • (x^ - exj^) 

and standard symmetrization inequalities (applied conditionally a times) we obtain. 



\\f{X)-Ef{X)\\p (48) 



d=l a=l ki,...,ka>0 
ki+... + ka=d 

D d 



d=la=l fci,...,fca>0 
ki-\-... + ka—d 



< c^° E E E E '^::t' (4," (4" )" • • • € (<"' )'• 

ie[n]a. 
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(note that in the first part of Theorem 7.1 one does not impose any symmetry assumptions on 
the functions hi). 

We wiU now use the following standard comparison lemma (for reader's convenience its proof 
is presented in the Appendix). 

Lemma 5.4. For any positive integer k, ifYi,...,Yn are independent symmetric variables with 
WYiUy, < M, then 

n n 

W'^aiYiWp < CkM\\ '^ttiQii ■■■gikWp, 

i=l i=l 

where gij are i.i.d. AA(0, 1) variables. 

Note that for any positive integer k we have ll-'^f ||?/)2/fc = 11^*11^2 — ^^ ^ ^^ (^^) together with 
the above lemma (used repeatedly and conditionally) yield 



\\f(X)-¥.f(X)\\„ 



l<d<D a=l fci,...,fca>0 i(z[n]iL 

ki + ... + ka—d 

where {g[l) is an array of i.i.d. standard Gaussian variables. Consider now multi-indexed 
matrices Bi, . . . , Bd defined as follows. For 1 < d < D, and a multi-index r = (ri, . . . , r^) G [n] 
let X = {/i, . . . , la} be the partition of {1, ... , d} into the level sets of r and ii, . . . ,ia be the 
values corresponding to the level sets Ii, . . . , /q. Define moreover 

dd) _ A#h,...,#Ia) 

^ri,...,ra — "ji,...,ia 

(note that thanks to (47) this definition does not depend on the order of /i, . . . ,Ia)- Finally, 



define the d-indexed matrix Bd = [h 



'r Jreln 



d. 



Let us also define for fci, . . . , fca > 0, Yl'i=i h = d the partition IC{ki, . . . , kg) G Vd by splitting 
the set {1, . . . , d} into consecutive intervals of length ki, . . . ,ka, i.e., /C = {Ki, . . . , Ka}, where 
ioT l = l,...,a, Ki = {l + Y!^^1 ki, 2 + Yfi=l h,..., Y!i=i h]- 

Applying Theorem 3.1 to the right hand side of (49), we obtain 



||/(X)-E/(X)||p 

d 



<co Y. ^'E E \{BdoiLiKik,,...,,^)),(S)<S)(9t 



Sj) 



H,kj)i<n 



l<d<D a=l fci,...,fca>0 j=l k=l 

ki + ... + ka—d 



<cdY. L'Y. E E^*^/^ii^^°iw.,...,M)ii^- 

l<d<D a=l k^.....ka>o j£Pa 

ki-\-...-\-ka—d 

Note that for ah ki,...,ka by Corollary 5.3 we have ||-Bdol^(^(^^^ ;,^-))||j^ < Cd\\Bd\\j. Thus 
we obtain 

\\f{X)-Ef{X)\\,<Cn Yl L'^ Y P*''^'\\^'iy- 

i<d<D jePd 

Our next goal is to replace Bd in the above inequality by ED'^/(X). To this end we will 
analyse the structure of the coefficients of Bd and compare them with the integrated partial 
derivatives of /. 
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Let us first calculate ED /(X). Consider r G [n] , such that ii, . . . ,ia are its distinct values, 
taken /i, . . . , /q times respectively. We have 



« ..,. '.'i,^.. m = E E E E 



(J JL f-i * * * U Jbrp J 



kl>ll,...,ka>la a<m<D fea + l,--.,fcm>0 ia + l.---.J™ 

fel+... + fcm<-D (ii,...,im)e[n]22. 






^ j=l j=a+l j=l^ ^ ^'' . 

where we have used (46) . 

By comparing this with the definition of 6ri,...,rd and d\ ^'"^' " one can see that the sub- 
sum of the right hand side above corresponding to the choice ki = li, . . . ,ka = la is equal to 

ani\---lalbit..,r,. 

In particular for d = D, since li + . . . + la = D, we have 

E ^/ (X) = a\h\ ■ ■ ■ lJbiZ.,rn 

and so 

\\Bd\\j< Y. Il^^°li^(^)ll^< E I|D''/W°1l(;c)IIj<Cd||D^/(X)||^, 
KeVo KeVo 

where in the last inequality we used Corollary 5.3. Therefore if we prove that for all d < D and 
all partitions X = {/i, . . . , la], J" = { Ji, . . . , Jj,} G Pd, 

||a!#/i!---#/J(i?dol^i))-ED'^/(X)o 1^^)11^ <Cd Y. ^'~"' E H^^H^' (^0) 

d<k<D KePfc 

then by simple reverse induction (using again Corollary 5.3) we will obtain 

l<d<D JePd l<d<D JePd 

which will end the proof of the theorem. 

Fix any d < D and partitions X = {/i, . . . , /«}, i7 = { Ji, . . . , Jb} G Pd- Denote k = #Ii. For 
every sequence ki, . . . ,ka such that ki > /j for i < a and there exists i < a such that ki > /«, 
let us define a d-indexed matrix Ej- ' ^'"'' "^ = (ei- ' ^'"'' " )re[nl'*i such that ei- ' ^'"'' "^ = if 
r ^ L(X) and for r G L(X), 



e 



E E E : 4''i,:';f;i,v,n«r'' n e^.' 



a<m<D fca + l,...,fem>0 ia + li---,im ^ ^ jf = l j = a+l 

ki + ... + km<D (ii,...,im)e[n]m 

where ii, . . . ,ia are the values of r corresponding to the level sets Ii, . . . ,Ia- We then have 

ki>li,...,ka>la 

Since we do not pay attention to constants depending only on Z), by the above formula and the 
triangle inequality, to prove (50) it is enough to show that for all sequences ki, . . . ,ka such that 
ki > li for i < a and there exists i < a such that ki > li one has 

||^(dA,...,/c.)||^ < CnL^.<_^^'^-h)\\B,,^^,^,jK (51) 
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for some partition K. G Vki+...+ka with ^^/C = jj^J (note that J2'j<ah ~ '^)- Therefore in what 
follows we will fix /ci, . . . , /ca as above and to simplify the notation we will write E^'^> instead of 



Fix therefore any partition X = {/i, . . . , /„} G 'Pki+...+ka such that #/j = ki and /j C /j for 
alH < a (the specific choice of X is irrelevant). Finally define a (fci + . . . + A;a)-indexed matrix 

^(fci+...+fc„) ^ (^'''^■■■^''"^)rGHd by setting 

In other words the new matrix is created by embedding the d-indexed matrix into a "generalized 
diagonal" of a (fci + . . . + A;a)-indexed matrix by adding Yli<ai^j ~ h) ^^^ indices and assigning 
to them the values of old indices (for each j < a we add kj — Ij times the common value attained 
by r{i,...,d} on Ij). 

Recall now the definition of the coefficients bj. and note that for any r G L(I) C ^fijki+---+ka 
we have er " = 6r " ni'=i "^-^i ^ ^ > where for j < a, ij is the value of r on its level set 

ij. This means that E(''^+-+ka) = {Bk,+...+k^ o 1^^^^ o (^^l+'-'+'^-i;,), where v^ = (Exf^-'Oi<n 
if s G {min/i, . . . ,min/a} and Vg = (1, . . . , 1) otherwise. Since ||ws||oo < {CdL)^^~^^ if s G 
{min/j}j<a and ||ws||oo = 1 otherwise, by Lemma 5.1 this implies that for any /C G ^^^-1-...+^^, 



(53) 

where in the last inequality we used Corollary 5.3. 

We will now use the above inequality to prove (51). Consider the unique partition /C = 
{Ki, . . . , Xfo} satisfying the following two conditions: 

• for each j < b, Jj CI Kj, 

• for each s G {d + 1, . . . ,ki + . . . + ka} if s G Ij and 7r(s) := min/j G Jk, then s G Kk. In 
other words all indices s, which in the construction of X were added to Ij (i.e., elements 
of Ij \ Ij) are now added to the unique element of J containing 7r(s) = min Ij = mm Ij. 

Now, it is easy to see that ||-E^'^'||j' < ||£^(^i+'''+^"'||x:. Indeed, consider an arbitrary x^^' = 
ixrj^)\rj^\<n, J = 1, • • • , &, Satisfying ||x(^)||2 < 1. Define y^J^ = iyrK^)\rK^\<n, J = 1, • • • ,& with 
the formula 

UrKj = Xrj(^nld] [[ ^{rs=r^^s)}- 
seKj\[d\ 

We have ||y^-''||2 = Ha^'-^'lb ^ 1- Moreover, by the construction of the matrix ^(^i+'-'+'^o) (recall 
(52)), we have 

6 6 6 

'S^ ^WTTt^^')- \^ z(''^ + -+ka)TT (j) _ V^ ^(ki+-+ka)TT (j) 



(in the last equality we used the fact that if r G -^(/), then for s > d, r^/g\ = Vg and so y)-'^ = 

•^^K-nid] ~ ^r/ )• By taking the supremum over x^^' one thus obtains HE'' •'Hj- < ||ii^'^+'"+ "•'Hx;. 
Combining this inequality with (53) proves (51) and thus (50). This ends the proof of Theorem 
1.4. 
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5.3 Application: Subgraph counting in random graphs 

We will now apply results from Section 5 to some special cases of the problem of subgraph 
counting in Erdos-Renyi random graphs G{n,p), which is often used as a test model for devi- 
ation inequalities for polynomials in independent random variables. More specifically we will 
investigate the problem of counting cycles of fixed length. 

It turns out that Theorem 1.4 may give in some ranges of parameters optimal inequalities 
(leading to improvements of known results) , whereas in some other regimes the estimates it gives 
are suboptimal. 

Let us first describe the setting (we will do it in a slightly more general form that needed for 
our example). We will consider undirected graphs G = {V, E), where 1/ is a finite set of vertices 
and E is the set of edges (i.e. two-element subsets of V). By Vq = V{G) and Eq = E{G) we 
mean the set of vertices and edges (respectively) of a graph G. Also, vq = v{G) and ec = e{G) 
denote the number of vertices and edges in G. We say that a graph H \s a, subgraph of a 
graph G if Vh ^ Vq and Eh C Eq (thus a subgraph is non- necessarily induced). Graphs H 
and G are isomorphic if there is a bijection vr: Vh — ?■ Vq such that for all distinct v,w £ Vh, 
{7r{v),7:{w)} G Eg iff {v,w} G Eh- 

For p G [0, 1] consider now the Erdos-Renyi random graph G = G{n,p), i.e., a graph with n 
vertices (we will assume that Vq = [n]) whose edges are selected independently at random with 
probability p. In what follows we will be concerned with a number of copies of a given graph 
H = {[k], Eh) in a graph G, i.e., the number of subgraphs of G which are isomorphic to H. We 
will denote this random variable by YH{n,p). To relate YH{n,p) to polynomials, let us consider 
the family G{n, 2) of two-element subsets of [n] and the family of independent random variables 
X = {Xe)e£C{n,2), such that F{Xe = 1) = 1 — F{Xe = 0) = p (i.e., Xe indicates whether the 
edge e has been selected or not). Denote moreover by Aut{H) the group of isomorphisms of H 
into itself and note that 

yHin,p) = jj^ E n ^i-,-}- 

^ ^ iefnl*L v,we[k] 

v<w,{v,w}eE{H) 

The right-hand side above is a homogeneous tetrahedral polynomial of degree en- Moreover 
the variables X^^^^y satisfy 

E exp (^X^^^y log(l/p)) =l-p + p--<2 



and 



Eexp(X2^„,jlog2) <2, 



which implies that \\X{^^^y\\^^ < (logil/p))-^/^ A (log(2))-i/2 < ^(iog(2/p))-i/2. 
We can thus apply Theorem 1.4 to YH{n,p) and obtain 

F(mn.p) - Er„(«.p)l > «) < 2exp ( " ^,mi„^^ mm ( ^.||e J;(;,)||^ )''"). (^) 

where Lp = y/2{log{2/p)y^^^ and /: ^^'("'2) ^ M is given by 

f(.{Xe)eeCin,2)) = „^^^/^x Yl H ^{iv,i^}- 

""^ ^ ' ie[n]h. v,welk] 

v<.'w.{v.'w}^E 

Deviation inequalities for subgraph counts have been studied by many authors, to mention 
[36, 35, 63, 34, 37, 22, 26, 25]. As it turns out the lower tail F(YH{n,p) < EYH{n,p)-t) is easier 
than the upper tail F{YH{n,p) > KYH{n,p) + t). The lower tail turns out to be also lighter than 
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the upper one. Since our inequalities concern \YH{n,p) —E.YH{n,p)\, we cannot hope to recover 
optimal lower tail estimates, however we can still hope to get bounds which in some range of 
parameters n,p will agree with optimal upper tail estimates. 

Of particular importance in literature is the law of large numbers regime, i.e., the case 
when t = eEl//(n,p). In [34] the Authors prove that for every e > such that P(l//(n,j») > 
{l+e)EYH{n,p)) > 0, 

exp (-C{H,e)M*H{n,p)log-] <F{YH{n,p) > {1 + e)KYH{n,p)) < exp {- c{H,e)M*fj{n,p)) 

(55) 
for certain constants c{H,e),C{H,e) and a certain function M^{n,p). Since the general defini- 
tion of M^ is rather involved we will skip the details (in the examples considered in the sequel 
we will provide specific formulas). Note that if one disregards the constants depending only on 
H and e, the lower and upper estimate above differ by the factor log(l/p) in the exponent. To 
our best knowledge providing a lower and upper bound for general H, which would agree up to 
multiplicative constants in the exponent (depending only on H and e, but not on n or p) is an 
open problem. 

We will now specialize to the case when H is a, cycle. For simplicity we will first present the 
case of the triangle K^ (the clique with three vertices). For this graph the upper bound from 
[34] has been recently strengthened to match the lower one (up to a constant depending only 
on e) by Chatterjee [22] and DeMarco and Kahn [26] (who also obtained a similar result for 
general cliques [25]). In the next section we show that if p is not too small, the inequality (54) 
also allows to recover the optimal upper bound. In Section 5.3.2 we provide an upper bound for 

fc-2 1 

cycles of arbitrary (fixed) length k, which is optimal for p > n 2(fc-i) log~2 n. 



5.3.1 Counting triangles 

Assume that H = K^ and let us analyse the behaviour of ||ED /(X)||j' for d = 1, 2, 3. Of course 
in this case #Aut{H) = 6. 

We have for any e = {v, w}, v^w £ [n], 

_d_ 
dxp 



'J\-^) / ^ ■^{i,v}-^{i,w} 



i^[n]\{v,w} 



and so ||ED/(X)|||i} = (n - 2)p'^y'n{n - l)/2 < n'^p'^. 

For ei = 62 or when ei and 62 do not have a common vertex, we have q^ q^ — / = 0, whereas 
for ei, 62 sharing exactly one vertex, we have 



CXg^ 0X^2 

where v,w are the vertices of 61,62 distinct from the common one. Therefore 

■"^-'-^ J \^ ) Pv-'-{ei,e2 have exactly one common vertex} Jei,e2GC{n, 2)' 

Using the fact that ¥,'D'^f(X) is symmetric and for each ei the sum of entries of ¥,'D'^f{X) in 
the row corresponding to 61 equals 2p(n — 2), we obtain ||ED2/(X)|| n||2} = 2p(n — 2) < 2pn. 
One can also easily see that ||ED2/(X)|| m 2} = P\/n{n — l)(n — 2) < pv?''^ . 
Finally 

f)rr fi^ ftrr J -L{ei, 62, 63 form a triangle} 

and thus ||ED^/(X)|| n 2,3} = \/n(n — l)(n — 2) < r?''^ . Moreover, due to symmetry we have 

||ED3/(X)|||i,2}{3} = ||ED-V(^)||{1,3}{2} = ||ED3/(X)|||2,3}{1}. 
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Consider arbitrary (a:ei)eieC(n,2) and (ye2,e3)e2,e3eC{n,2) of norm one. We have 



/ J {eiie2ie3 form a trianglej'^ei 2/62,63 _i \ I / _, I / _, -'-{ei, 62, £3 form a triangle} 2/62, 63 



61,62,63 V 61 62,63 



— /,1 /j \ 7 J {61,62,63 form a triangle} ) I / ^ {61,62,63 form a triangle} 2/62, 63 

V 61 62,63 62,63 



\l2(n- 2) / 2^ Vl-^^ez 2^ l{6i, 62, 63 form a triangle} < \/2(n -2), 



62,63 61 



where the first two inequalities follow by the Cauchy-Schwarz inequality and the last one from 
the fact that for each 62, 63 there is at most one e\ such that ei, 62, 63 form a triangle. We have 
thus obtained ||ED3/(X)|||i,2}{3} = ||ED3/(X)|||i,3}|2} = ||ED3/(^)||{2,3}{i} < ^2^- 

It remains to estimate ||ED3/(X)||{i}{2}{3}. For all (a;e)eec(n,2), (2/6)66C(n,2), (^6)66C(n,2) of 
norm one we have by the Cauchy-Schwarz inequality 

/ J -'-{61,62,63 form a triangle}2^6i 2/62^^63 — / ^ ■^{n,J2}y{«2,«3}^{n,«3} 

61,62,63 (Jl,«2,«3)e[n]l 

- ZZ ( 2I ^{n,i2}4i,i3}j ( 2-/ y\i2,h} 

ne[n] fe,i3)e(N\{ii})2 fe,i3)G(N\{n})^ 

<^E( E 4im)'^'( E 4.3})'^' 

nG[n] J2e[n]\{Ji} i3GN\{«i} 

(«i,*2)e[n]^ (n,«3)e[ra]^ 

which gives ||ED3/(X)|||i||2}{3} < 2^/2. 

Using (54) together with the above estimates, we obtain 
Proposition 5.5. For any t > 0, 

P(|Y/^3(n,p)-Eyx3(^,p)l>i) 

/ 1 / t2 ^ ^2/3 

< 2 exp ( — ■— min 



C V L6n3 + L4p2„3 + i2^4„4 ' ^3^1/2 + 2.2pn ' ^ 

— 1/9 

where Lp = (^ log(2/p)) 

In particular for t = e¥,YK.^{n,p) = e(^^p'^, 

F{\YK,{n,p)-EYK,{n,p)\ > eEYK,{n,p)) 

< 2 exp l' - — min (e'^n^p^ log^(2/p), (e^ a £2/3)^2^2 iog(2/p) 

Thus for p > n^i log^2 n we obtain 

F{\YK.Mp)-EYK,in,p)\ >eEYK.Mp)) < 2exp {- (e^ A e^/^^)nVlogi2/p)) . 

By Corollary 1.7 in [34], if p > 1/n, then Y;V?p^ < M^. {i^,p) < Cri^p^ (recall (55)) and so for 
p > n~^'^ log^ ' n the estimate obtained from the above proposition is optimal. As already 
mentioned the optimal estimate has been recently obtained in the full range of p by Chatterjee, 
DeMarco and Kahn. Unfortunately it seems that using our general approach we are not able 
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to recover the full strength of their result. From Proposition 5.5 one can also see that Theorem 
1.4, when specialized to polynomials in 0-1 random variables is not directly comparable with the 
family of Kim-Vu inequalities. As shown in [35] (see table 2 therein), various inequalities by Kim 
and Vu give for the triangle counting problem exponents —Tam{n^'^p^'^,n^''^p^''^), — n^'^p^'^, 
—np (disregarding logarithmic factors). Thus for "large" p our inequality performs better than 
those by Kim-Vu, whereas for "small" p this is not the case (note that the Kim-Vu inequalities 
give meaningful bounds for p > Cn~^ while ours only for p > Cn~^''^). As already mentioned 
in the introduction the fact that our inequalities degenerate for small p is not surprising as even 
for sums of independent 0-1 random variables, when p becomes small, general inequalities for 
the sums of independent random variables with sub-Gaussian tails do not recover the correct 
tail behaviour (the || • ||^2 norm of the summands becomes much larger than the variance). 

5.3.2 Counting cycles 

We will now generalize Proposition 5.5 to cycles of arbitrary length. If i7 is a cycle of length fc, 
then by Corollary 1.7 in [34], -^n^p^ < Mfj{n,p) < Cri^p^ for p > 1/n. Thus the bounds for the 
upper tail from (55) imply that for p > 1/n, 

exp ( - C(A:,e)nVlog(l/p)) < P(Yff(n,p) > {I + e)EYH{n,p)) < exp ( - c{k,e)n^p^) 

for every e > for which the above probability is not zero. 

We will show that similarly as for triangles. Theorem 1.4 allows to strengthen the upper 
bound if p is not too small with respect to n. More precisely, we have the following 

Proposition 5.6. Let H he a cycle of length k. Then for every t > 0, 

/ 1 / t^ / t^/' 
P(|yi/(n,j9)-EYH(n,p)| > t) < 2exp - -- — TT-^ A min ^^77. 

P d<k0T~l>l ^P P ' "' 



n, 



where Lp = (log(2/p)) . In particular for every e > and p > n 2(fc-i) jog ' 

F{YH{n,p) > {l + e)EYH{n,p)) < 2exp ( - ^(e^ a e2A)nV log(2/p)) . 

To prove the above proposition we need to estimate the corresponding || • \\j norms. Since 
a major part of the argument does not rely on the fact that H is a cycle and bounds on || • || j- 
norms may be of independent interest, we will now consider arbitrary graphs. Let thus H he a 
fixed graph with no isolated vertices. 

Similarly to [34], it will be more convenient to count "ordered" copies of a graph H in 
G{n,p). Namely, for H = ([A;],i?/f), each sequence of k distinct vertices in the clique -fC„, 
i G [n]- determines an ordered copy G{ of H in i^„, where G{ = i(-ff), i.e., V{G{) = i{[k]) and 
E{Gi) = {i(e): e G E{H)] = {{iu,iv]- {u,v] G E{H)]. Define 

XHin,p) := Y^ l{GiCG(n,p)} = Y^ n ^^■ 
ieH- ieN-^ eG£(Gi) 

Clearly XH{n,p) = i^Aut{H)YH {n, p) and XH{n,p) = f{X) where 

/(^):= E n ^^-= E n ^i(e)- (56) 

ie[n]fe ee£;(Gi) ie[n]h. eeE{H) 

A sequence of distinct edges (ei,...,erf) G E{Kn)- determines a subgraph Go C K^ with 
l'(Go) = Utl Ci, B(Go) = {ei. . . . ,5i). Note that 

«°"^(--' ^= a/^ = T. n -^ 

"^ '''* ieM^: Gi3Go e(iE{Gi)\E{Go) 
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and thus 

Consider e = (ei, . . . , e^) G E{H)- and let i:ro(e) be the subgraph of H with V{HQ{e)) = \Jl^i Cj, 
£;(i7o(e)) = {ei,...,erf}. Clearly, for any i G [n]^ i(i:^o(e)) C Gi. We write (ei,...erf) ~ 
(ei,...,e(i) if there exists i G [n]- such that i(ej) = Cj for j = l,...,d. Note that given 
(ei, . . . , Cfi) G E[Kn)- and the corresponding graph Gq, 

#{i G N^: Go C Gi} = j; #{i G N^: i(e,) = e,- for j = 1, . . . , d} 

e'=E{H)!L 

^ ^ 2-(^°(-))(n - Kgo(e))) ^-(^"(-)) l{(e-„...,,,).e}, 

eeE(//)^ 

where for a graph G, f (G) is the number of vertices of G and s{G) is the number of edges in G 
with no other adjacent edge. Therefore, 

Let iT" be a partition of [d\. By the triangle inequality for the norms \\-\\j^ 



ED"'/(X) < p'^W-^ y^ 2^''^'^^^^^''n^^^^^^^^^^'' 



J 



l-{(ei,...,ed)~e};(g^^^^^g^) 



})(ei. 



e£E{H)^ 

The norms appearing on the right hand side of (57) are handled by the following 



J 



(57) 



Lemma 5.7. Fix 1 < d < e{H), e = (ei, . . . , e^^) G E{H)^ and J = { Ji, . . . , J;} G Pd- Let 
Hq = Ho{e) and for r = l,...,l, let Hr be a subgraph of Hq spanned by the set of edges 
{ej-. j G Jr}- Then, 



'-{{ei,...,ed)~{ei,...,ed)}j(e^...^ed) 



J 



< 2~s{H„)+h,T.l=is(Hr) 



X n2 



l#{v£V{Ho) ■■ V e V{Hr) for exactly one r e [/]} 



Proof. We shall bound the sum 



J2 ^ii^i' 

ei,...,ea£E{Kn) 






(58) 



under the constraints JZ/z \ ,zt^(w 



Sr) 



\Jt I -J^t- \ I < 1 for r = 1, . . . , L Note that we can 

assume x^^' > for all r G [/]. Rewrite the sum (58) as the sum over a sequence of vertices 
instead of edges: 






where for two sets A, B, A— is the set of 1-1 functions from B to A. Further note that it is 
enough to prove the desired bound for the sum 



I 






(r) 

V(Hr) 



(59) 
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under the constraints 2 ''^ J2i(z\n]YlEil\yi ) < 1 for each r = l,...,l. Indeed, given 

x's, for each r = 1,...,/ and ah i G In] ^ ^'' take y- = xJ., •, ^ and notice that the 

sum (59) equals the sum (58) while the constraints for x's imply the constraints for y's. Finally, 
by homogeneity and the fact that the sum (59) does not depend on the full graph structure but 
only on the sets of vertices of the graphs Hr, the lemma will follow from the statement: For a 
sequence of finite, non-empty sets Vi, . . . ,Vi, let V = ViU . . .UVi. Then 



/ 






,(r)^2 



for y^^' , . . . , y*-'-* > satisfying 

ie[n]-5:t 

We prove (60) by induction on i^V. For F = (and 1 = 0), (60) holds trivially. For the 
induction step fix any vq £ V and put R = {r £ [I]: vq £ Vr}. We write 

E ri^s;:= E (I n^-n e Hvi: 

ieH^ r=l ie[„]nilol V \re[/]\i? / i.o<^[n]\i{V\{vo}) reR 

We bound the inner sum using the Cauchy-Schwarz inequality. If ^R > 2, we get 

E n^^m E (cr" 

i„oeH\i(v\{''o}) '-e-R reiJ \i„o6M\i{n{^o}) 

and if i? = {ro} then 

E <.iM E «)V. 

i„oe[n]\i{n{^o}) \i.oeN\i(n{^o}) / 

Now, for each r £ R put Wr = Vr \ {vq} a-i^d define 

4Z=i E (^S)') for alllH.. € Nl^. 

\iv,,e[n]\iiWr) J 

Note that if W^ = then z^**) is a scalar and by (61), < z^'') < 1. For r G [/] \ R, just put 
VF^ = Vr and z*^'^'-' = y^"^' . Let L = {r G [/] : VF,. ^ 0}. Combining the estimates obtained above, 
we arrive at 

Now we use the induction hypothesis for the sequence of sets {Wr)r&L and the vectors z^"^', r £ L 
(note that EieMW^(^[i;J^)2<l). D 
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Remark The bound in Lemma 5.7 is essentially optimal, at least for large n, say n > 2k. To 
see this let us analyse optimality of (60) under the constraints (61) (it is easy to see that 
this is equivalent to the optimality in the original problem). Denote Vq = {v £ V : v £ 
Vr for exactly one r G [/]}. Fix any i^'^' G [n]-. Then for r = 1, . . . ,1 take 



yiv, 



otherwise. 



The vectors y''^' satisfy the constraints (61) and 



(n-#(y\Vb))^n-5#^o > (^/2)#^'n-5#^o = 2-#^On5#^o. 



Combining Lemma 5.7 with (57) we obtain 

Lemma 5.8. Let H be any graph with k vertices, which are not isolated, and let f be defined 
by (56). Then for any 1 < d < e{H) and any J = {Ji, . . . , Ji} € Pd, 

\\KD''f{X)\\j 

< pe(H)^d y^ 2|ELis(^r(e))^fc-^,(/fo(e))+i#{i'Gy(//o(e)): DG V{H,(e)) for exactly one r e [l]}^ 
eeE{H)^ 

where for e G E{H)- and r G [I], Hr{e) is the subgraph of HQ{e) spanned by {cj: j G Jr}- 

We are now ready for 

Proof of Proposition 5.6. We will use Lemma 5.8 to estimate ||ED'^/(X)||j' for any d < k and 
J e Pd with #J = I. Note that for any e G E{H)^, 

viHoie)) - |#{w G ViHoie)): v G V{Hrie)) for exactly one r G [/]} 

= -(t'(ifo(e)) + #{v G V{Ho{e)): v belongs to more than one V{Hr{e))}^ 

■- k/2 a d = k and I = 1, 

> ^{d + I) otherwise, 

where to get the second inequality we used the fact that each vertex of H has degree two and 
the inclusion-exclusion formula. Thus we obtain 

||EDV(^)||{[fc]}<n'=/2, 

\\ET>'^f{X)\\j < CkP^-'^n^-y-^^l if d < fc or / > 1. 



Together with (54) this yields the first inequality of the proposition. Using the fact that 



EyH-(n,p) > -^n^p^, the second inequality follows by simple calculations. D 
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6 Refined inequalities for polynomials in independent random 
variables satisfying the modified log-Sobolev inequality 

In this section we refine the inequahties which can be obtained from Theorem 3.3 for polynomials 
in independent random variables satisfying the /3-modified log-Sobolev inequality (15) with 
/3 > 2. To this end we will use Theorem 3.4 together with a result from [2], which is a counterpart 
of Theorem 3.1 for homogeneous tetrahedral polynomials in general independent symmetric 
random variables with log-concave tails, however only of degree at most 3. We recall that for a 
set /, by Pj we denote the family of partitions of / into pairwise disjoint, nonempty sets. 

Theorems 3.1 and 3.2 and 3.4 from [2] specialized to WeibuU variables can be translated into 

Theorem 6.1. Let a € [1,2] and let Yi,...,l^ be a sequence of i.i.d. symmetric random 
variables satisfying P(|5^| > i) = exp(— t"). Define Y = (Yi, . . . ,Yn) and let Zi,...,Zd be 
independent copies ofY. Consider a d-indexed matrix A. Define also 

mdP,A)=Y.E E P*^^'^*''^"ll^lh'C> (62) 

IC[d]JePilCePia]\i 

where for J = { Ji, . . . , J^} G P/ and K = {i^i, . . . , K\^ G ^{d\\lj 

r k 

WMjiK = E ^^p{ E «in4^ IlySl ■ 11(4^)112 <hfori<i<r, 

siGXi,...,SfceXfe iefnld 1=1 1=1 



EiK<W.>ii2<i>/«-i<^<fc}- 



isi<n 

If d < 3, then for any p > 2, 

C;i^md{p,A) < \\{A,Zi(^---(^Zd)\\p<Cdmd{p,A). 
Moreover, if a = 1, then the above inequality holds for all d> 1. 

Before we proceed, let us provide a few specific examples of the norms H^Hj-ik:) which for 
a < 2 are more complicated than in the Gaussian case. In what follows, /5 = ^^^ (with /5 = oo 
for a = 1). For d = 1, 

ll(«i)ll{i}|0 =sup{^aiXi: ^xf <l} = \{ai)\2, 
Il(«i)ll0|{i} =sup{^aiyi: E'^*'" - = \i'^i)\p- 
For d = 2, ||(aij)||{i,2}|0 = IKoijOllns, ll(aii)ll{i}{2}|0 = ll(«ij)ll£2^^2i 

ll(aii)ll{i}|{2} = sup { ^ aijXiyj : ^xj < l,^|yj|" < l} = \\{aij)\\i^^i>^, 
ll(aii)ll{2}|{i} = sup { ^ Oij-yiXj : ^x] < 1,^12/^]" < l} = \\{aij)\\i^^i^, 
ll(aij)||0|{i}{2} = sup { ^ Oij-yiZj : E l^/^T < l^E'^^'l" - ^l = ll(ay)ll^„^<?;3' 
and 

II (flij) Il0|{i,2} = sup { ^ Oijyij : E ( E 2/1) ' ^ 1} + s^P { E "»i^»i • E ( E 4) " ^ 1} 

i j j i 

1//3 /^^.^^ <,xfl/2\l//3 



(E(E4)'"T^(E(E 



4)«^ 
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For d = 3, we have, for example, 

ll(aijfe)ll{2}|{i}{3} =sup{^aijkyiXjZk: ^k^f < l,^|yj|" < l,J2\^k\"' < l}, 
ll(aijfc)ll{2}|{i,3} = sup {^aijkXjVik-- ^x] < 1,"^ {'^yff.y < 1} 

i k 

+ sup { ^ aijkXjVik : J]] a;^ < 1, ^ ( ^ y|) 2 < l} , 

k i 

Il(«iifc)ll0|{i}{2,3} = sup {^aijkViZjk-. ^IVi^ < l'X](IZ^ifc)' - 1} 

j k 

+ sup { ^ aijkViZjk : ^IVi^ < '^,^ {^ z]^)^ < ^ ■ 

k j 

In particular, from Theorem 6.1 it follows that for a G [1,2], if 1" = (Yi, . . . ,1^) is as in 
Theorem 6.1 then for every x G M", 

where | • \r stands for i^ norm (see also [28]). Thus, for /3 G (2, 00), the inequality of Theorem 
3.4, for m = n, k = 1 and a C^ function / : R" — )■ R, can be written in the form 

||/(X) - Ef{X% < Cfs\\{Vf{X),Y)\\,. (63) 

This allows for induction, just as in the proof of Proposition 3.2, except that instead of Gaussian 
vectors we will have independent copies of Y. We can thus repeat the proof of Theorem 3.3, 
using the above observation and Theorem 6.1 instead of Theorem 3.1. This argument will then 
yield the following proposition, which is a counterpart of Theorem 3.3. At the moment we can 
prove it only for D < 3, clearly generalizing Theorem 6.1 to chaoses of arbitrary degree would 
immediately imply it for general D. 

Proposition 6.2. Let X = {Xi, . . . ,Xn) be a random vector in R", with independent compo- 
nents. Let /3 G (2, 00) and assume that for all i < n, Xi satisfies the f3-modified logarithmic 
Sobolev inequality with constant D^Sn ■ Let f : R" — )■ R 6e a C function. Define 

mip,f) = \\mD{p,Ii''f{X))\\^+ Y. "^d(P,ED^/W), 

l<d<D-l 

where md{p,A) is defined by (62) with a = -J^- If D < 3 then for p > 2, 

\\f{X)-Ef{X%<C^,D,s,HP,f)- 
As a consequence, for all p > 2, 

P(|/(X) - E/(X)| > Cp,D,s,m{pJ)) < e-P. 

Remarks 

1. For /? = 2, the estimates of the above proposition agree with those of Theorem 1.2. For /3 > 2 
it improves on what can be obtained from Theorem 3.3 in two aspects (of course just for D <?>). 
First, the exponent of pis smaller as (7- l/2)d + #(J'U/C)/2 = (1/a- l/2)(i + #J'/2 + #/C/2 > 
j^J 12 + j^lfija. Second [[AHj-ux: > II^IIj'Ik: (since for a < 2, \x\a > \x\2, so the supremum on 
the left hand side is taken over a larger set). 
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2. From results in [2] it follows that if / is a tetrahedral polynomial of degree D and Xi are 
i.i.d. symmetric random variables satisfying P(|Xj| > t) = exp(— t"), then the inequalities of 
Proposition 6.2 can be reversed (up to constants), i.e., 

\\f{X)-¥.f{X)\\,>^mf{p). 

This is true for any positive integer D. 

3. One can also consider another functional inequality, which may be regarded a counterpart of 
(15) for (3 = oo. We say that a random vector X in M" satisfies the Bobkov-Ledoux inequality if 
for all locally Lipschitz positive functions such that |V/(x)|oo := niaxi<j<„ \-Srf{^)\ ^ dsLfix) 
for all X, 

Ent/2(X) < DBLnVf{X)f. (64) 

This inequality has been introduced in [9] to provide a simple proof of Talagrand's two-level 
concentration for the symmetric exponential measure in M". Here \-^f{x)\ is defined as "partial 
length of gradient" (see (12)). Thus in the case of differentiable functions |V/|oo coincides with 
the i^o norm of the "true" gradient. 

In view of Theorem 3.4 it is natural to conjecture that the Bobkov-Ledoux inequality implies 

||/(X)-E/(X)||p<c(vp|||V/(X)|||^+p|||V/(X)U|Q, (65) 

which in turn implies (63) with Y = (Yi, . . . ,Yn) being a vector of independent symmetric 
exponential variables and some C^o < oo. This would yield an analogue of Proposition 6.2 for 
/3 = oo, this time with no restriction on D. 

Unfortunately at present we do not know whether the implication (64) =^ (65) holds 
true or even if (65) holds for the symmetric exponential measure in R". We only are able to 
prove the following weaker inequality, which is however not sufficient to obtain a counterpart of 
Proposition 6.2 for /3 = oo. 

Proposition 6.3. If X is a random vector in M", which satisfies (64), then for any locally 
Lipschitz function f : M" — )■ M, and any p >2, 

\\f{X)-Ef{X%<3(D]il^\\\Vf{X)\\\+d-^lp\\\Vf{X)\^\[ 



Proof To simplify the notation we suppress the argument X. In what follows || • ||p denotes the 
Lp norm with respect to the distribution of X. 

Let us fix p > 2 and consider /i = max(/, ||/||p/2). We have 

ll/i||p>ll/IU (66) 

||/l||2<^||/||p + ||/||2, 

||/i||p<^||/||p<3min/i. 

Moreover, /i is locally Lipschitz and we have pointwise estimates |V/i| < |V/|, |V/i|oo < 
|V/|oo- Assume now that we have proved that 



fillp < ll/ilb + \/^VP|||V/i|||, + ^lllV/i|oo|L. (67) 
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Then, together with the two first inequahties of (66), it yields 



p<||/l||p<||/l||2 + y^ 



DbL ^|||v7^ III , ^^ lllVTJ^ I I 



1 
< - 

- 2 



+ 



. + ,/^v^l||V/||t + ^|||V/|, 



which gives 



< 2 



2 + A/^V^|||V/||| +;^|||V/|c 



(68) 



2 vx-iM-Mlp 2dBL 

Since (64) implies the Poincare inequality with constant Dbl/'^ (see e.g. Proposition 2.3 in 



[27]), we can conclude the proof applying (68) to |/ — E/| (similarly as in the proof of Theorem 
3.4). Thus it is enough to prove (67). 

From now on we are going to work with the function /i only, so for brevity we will drop the 



subscript and write / instead of /i. Assume 



> 



3p 



ll|V/|c 



IP - 2dBL 

satisfied). Then, using the third inequality of (66), for 2 < t < p and all x G 



(otherwise (67) is trivially 



X . 



We can thus apply (64) with /*'^, which together with Holder's inequality gives 
Now, as in the proof of Theorem 3.4, we have 



di 



(E/*)2/* 



|(E/)|-iEnt/*<^|||V/|||J, 



which upon integrating gives 



< 



2 , ^BL |||v7i^|||2 

2 + ^^p\\Nf\\L 



which clearly implies (67). 
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7 Appendix 

7.1 Decoupling inequalities 

Let us here state the main decoupling result for [/-statistics (Theorem 1 in [24]). 

Theorem 7.1. For natural numbers n > d let (Xj)"^-,^ he a sequence of independent random 
variables with values in a measurable space {S,S) and let (X^- )"^^ j = 1, . . . ,d be d independent 
copies of this sequence. Let B be a separable Banach space and for each i G [n]- let h\: S'^ ^ B 
he a measurable function. Then for all i > 0, 



Y, hiX,„...,Xj >t)<C,F(\\ Y, /ii(<'\...,X 



'id ' 



>t/Cd 



le nM 



le np 



In consequence for all p > 1, 



■4'") 



lenU 



le nU 
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If moreover the functions hi are symmetric in the sense that, for all xi, . . . ,Xd G S and all 
permutations!:: [d] ^ [d], /ij-^,...,j^(xi, . . . ,Xd) = /ii^^,...,i^^(x,ri, • • • ,a;7rd); then for all t > f), 



ie[n]^ 



le ni 



and in consequence for all p > 1, 



iGnU 



iGnV 



7.2 Proof of Lemma 5.4 

Without loss of generality we can assume that M = 1. It is easy to see that for some constant 
Cfc and t > 1, F{Ck\gii ■ ■ ■ gik\ > t) > 2exp(-t2A). Since F{\Yi\ > t) < 2exp(-t2A), we get 

P(|ya|y,|>i| >t)<F{Ck\9ii---gik\ >t). 

Therefore, using the inverse of the distribution function, we can define i.i.d. copies Yi of 
|yjl|y.|>i| and i.i.d copies Zi of {gn ■ ■ ■ gik\, such that Yi < CkZi pointwise. We may assume that 
these copies are defined on a common probability space with Yi and gij. We can now write for 
a sequence £i of i.i.d. Rademacher variables independent of all he variables introduced so far. 



' «MIP 



' «llp 



/ ^ fl j -^ i 1 1 p — II / ^ 0,1^1 1 -i i I 

n n 

< II 2^ajej|yil{|y.|<i}|||p + II y ^aiei\Yil^\Y^\>i}\\\p 

i=l i=l 

n n 

= II 2.^0j^j|^il{|yi|<i}lllp + II / ^ai£iyi\ 

i=l i- 

n n 

^ II /^Oj^jllp + C'fcll / ^ ajSiZj 
i=l i=l 

n n 

^ CkW y^aiEiEzZiWp + Cfc II y ^ajEiZi 

1=1 i=l 

n n 

— ^k\\ / _, O'i^iZiWp = Cfc II y ^ aigi^ ■ ■ ■ 5'jfcllp) 



i=l 



■'tWp 



-'tWp 



i=l 



i=l 



where in the second inequality we used the contraction principle (once conditionally on l^'s and 
Zj's) and in the third one Jensen's inequality. 
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